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PREFACE 



Digital systems are created to perform data processing and control tasks. What distinguishes one 
system from another is an architecture tailored to efficiently execute the tasks for which it was de- 
signed. A desktop computer and an automobile’s engine controller have markedly different attributes 
dictated by their unique requirements. Despite these differences, they share many fundamental 
building blocks and concepts. Fundamental to digital system design is the ability to choose from and 
apply a wide range of technologies and methods to develop a suitable system architecture. Digital 
electronics is a field of great breadth, with interdependent topics that can prove challenging for indi- 
viduals who lack previous hands-on experience in the field. 

This book’s focus is explaining the real-world implementation of complete digital systems. In do- 
ing so, the reader is prepared to immediately begin design and implementation work without being 
left to wonder about the myriad ancillary topics that many texts leave to independent and sometimes 
painful discovery. A complete perspective is emphasized, because even the most elegant computer 
architecture will not function without adequate supporting circuits. 

A wide variety of individuals are intended to benefit from this book. The target audiences include 

• Practicing electrical engineers seeking to sharpen their skills in modern digital system design. 
Engineers who have spent years outside the design arena or in less-than-cutting-edge areas often 
find that their digital design skills are behind the times. These professionals can acquire directly 
relevant knowledge from this book’s practical discussion of modem digital technologies and de- 
sign practices. 

• College graduates and undergraduates seeking to begin engineering careers in digital electronics. 
College curricula provide a rich foundation of theoretical understanding of electrical principles 
and computer science but often lack a practical presentation of how the many pieces fit together in 
real systems. Students may understand conceptually how a computer works while being incapable 
of actually building one on their own. This book serves as a bridge to take readers from the theo- 
retical world to the everyday design world where solutions must be complete to be successful. 

• Technicians and hobbyists seeking a broad orientation to digital electronics design. Some people 
have an interest in understanding and building digital systems without having a formal engineer- 
ing degree. Their need for practical knowledge in the field is as strong as for degreed engineers, 
but their goals may involve laboratory support, manufacturing, or building a personal project. 

There are four parts to this book, each of which addresses a critical set of topics necessary for 
successful digital systems design. The parts may be read sequentially or in arbitrary order, depend- 
ing on the reader’s level of knowledge and specific areas of interest. 

A complete discussion of digital logic and microprocessor fundamentals is presented in the first 
part, including introductions to basic memory and communications architectures. More advanced 
computer architecture and logic design topics are covered in Part 2, including modern microproces- 
sor architectures, logic design methodologies, high-performance memory and networking technolo- 
gies, and programmable logic devices. 
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Part 3 steps back from the purely digital world to focus on the critical analog support circuitry 
that is important to any viable computing system. These topics include basic DC and AC circuit 
analysis, diodes, transistors, op-amps, and data conversion techniques. The fundamental topics from 
the first three parts are tied together in Part 4 by discussing practical digital design issues, including 
clock distribution, power regulation, signal integrity, design for test, and circuit fabrication tech- 
niques. These chapters deal with nuts-and-bolts design issues that are rarely covered in formal elec- 
tronics courses. 

More detailed descriptions of each part and chapter are provided below. 



PART 1 DIGITAL FUNDAMENTALS 



The first part of this book provides a firm foundation in the concepts of digital logic and computer 
architecture. Logic is the basis of computers, and computers are intrinsically at the heart of digital 
systems. We begin with the basics: logic gates, integrated circuits, microprocessors, and computer 
architecture. This framework is supplemented by exploring closely related concepts such as memory 
and communications that are fundamental to any complete system. By the time you have completed 
Part 1, you will be familiar with exactly how a computer works from multiple perspectives: individ- 
ual logic gates, major architectural building blocks, and the hardware/software interface. You will 
also have a running start in design by being able to thoughtfully identify and select specific off-the- 
shelf chips that can be incorporated into a working system. A multilevel perspective is critical to suc- 
cessful systems design, because a system architect must simultaneously consider high-level feature 
trade-offs and low-level implementation possibilities. Focusing on one and not the other will usually 
lead to a system that is either impractical (too expensive or complex) or one that is not really useful. 

Chapter 1, “Digital Logic,” introduces the fundamentals of Boolean logic, binary arithmetic, and 
flip-flops. Basic terminology and numerical representations that are used throughout digital systems 
design are presented as well. On completing this chapter, the awareness gained of specific logical 
building blocks will help provide a familiarity with supporting logic when reading about higher- 
level concepts in later chapters. 

Chapter 2, "Integrated Circuits and the 7400 Logic Families,” provides a general orientation to in- 
tegrated circuits and commonly used logic ICs. This chapter is where the rubber meets the road and 
the basics of logic design become issues of practical implementation. Small design examples pro- 
vide an idea of how various logic chips can be connected to create functional subsystems. Attention 
is paid to readily available components and understanding IC specifications, without which chips 
cannot be understood and used. The focus is on design with real off-the-shelf components rather 
than abstract representations on paper. 

Chapter 3, “Basic Computer Architecture,” cracks open the heart of digital systems by explaining 
how computers and microprocessors function. Basic concepts, including instruction sets, memory, 
address decoding, bus interfacing, DMA, and assembly language, are discussed to create a complete 
picture of what a computer is and the basic components that go into all computers. Questions are not 
left as exercises for the reader. Rather, each mechanism and process in a basic computer is discussed. 
This knowledge enables you to move ahead and explore the individual concepts in more depth while 
maintaining an overall system-level view of how everything fits together. 

Chapter 4, "Memory,” discusses this cornerstone of digital systems. With the conceptual under- 
standing from Chapter 3 of what memory is and the functions that it serves, the discussion 
progresses to explain specific types of memory devices, how they work, and how they are applicable 
to different computing applications. Trade-offs of various memory technologies, including SRAM, 
DRAM, flash, and EPROM, are explored to convey an understanding of why each technology has its 
place in various systems. 
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Chapter 5, “Serial Communications,” presents one of the most basic aspects of systems design: 
moving data from one system to another. Without data links, computers would be isolated islands. 
Communication is key to many applications, whether accessing the Internet or gathering data from a 
remote sensor. Topics including RS-232 interfaces, modems, and basic multinode networking are 
discussed with a focus on implementing real data links. 

Chapter 6, “Instructive Microprocessors and Microcomputer Elements,” walks through five ex- 
amples of real microprocessors and microcontrollers. The devices presented are significant because 
of their trail-blazing roles in defining modern computing architecture, as exhibited by the fact that, 
decades later, they continue to turn up in new designs in one form or another. These devices are used 
as vehicles to explain a wide range of computing issues from register, memory, and bus architectures 
to interrupt vectoring and operating system privilege levels. 



PART 2 ADVANCED DIGITAL SYSTEMS 



Digital systems operate by acquiring data, manipulating that data, and then transferring the results as 
dictated by the application. Part 2 builds on the foundations of Part 1 by exploring the state of the art 
in microprocessor, memory, communications, and logic implementation technologies. To effectively 
conceive and implement such systems requires an understanding of what is possible, what is practi- 
cal, and what tools and building blocks exist with which to get started. On completing Parts 1 and 2, 
you will have acquired a broad understanding of digital systems ranging from small microcontrollers 
to 32-bit microcomputer architecture and high-speed networking, and the logic design methodolo- 
gies that underlie them all. You will have the ability to look at a digital system, whether pre-existing 
or conceptual, and break it into its component parts — the first step in solving a problem. 

Chapter 7, “Advanced Microprocessor Concepts,” discusses the key architectural topics behind 
modem 32- and 64-bit computing systems. Basic concepts including RISC/CISC, floating-point 
arithmetic, caching, virtual memory, pipelining, and DSP are presented from the perspective of what 
a digital hardware engineer needs to know to understand system-wide implications and design useful 
circuits. This chapter does not instruct the reader on how to build the fastest microprocessors, but it 
does explain how these devices operate and, more importantly, what system-level design consider- 
ations and resources are necessary to achieve a functioning system. 

Chapter 8. “High-Performance Memory Technologies,” presents the latest SDR/DDR SDRAM 
and SDR/DDR/QDR SSRAM devices, explains how they work and why they are useful in high-per- 
formance digital systems, and discusses the design implications of each. Memory is used by more 
than just microprocessors. Memory is essential to communications and data processing systems. Un- 
derstanding the capabilities and trade-offs of such a central set of technologies is crucial to designing 
a practical system. Familiarity with all mainstream memory technologies is provided to enable a 
firm grasp of the applications best suited to each. 

Chapter 9, “Networking,” covers the broad field of digital communications from a digital hard- 
ware perspective. Network protocol layering is introduced to explain the various levels at which 
hardware and software interact in modern communication systems. Much of the hardware responsi- 
bility for networking lies at lower levels in moving bits onto and off of the communications medium. 
This chapter focuses on the low-level details of twisted-pair and fiber-optic media, transceiver tech- 
nologies, 8B10B channel coding, and error detection with CRC and checksum logic. A brief presen- 
tation of Ethernet concludes the chapter to show how a real networking standard functions. 

Chapter 10, “Logic Design and Finite State Machines,” explains how to implement custom logic 
to make a fully functional system. Most systems use a substantial quantity of off-the-shelf logic 
products to solve the problem at hand, but almost all require some custom support logic. This chap- 
ter begins by presenting hardware description languages, and Verilog in particular, as an efficient 
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means of designing synchronous and combinatorial logic. Once the basic methodology of designing 
logic has been discussed, common support logic solutions, including address decoding, control/sta- 
tus registers, and interrupt control logic, are shown with detailed design examples. Designing logic 
to handle asynchronous inputs across multiple clock domains is presented with specific examples. 
More complex logic circuits capable of implementing arbitrary algorithms are built from finite state 
machines — a topic explored in detail with design examples to ensure that the concepts are properly 
translated into reality. Finally, state machine optimization techniques, including pipelining, are dis- 
cussed to provide an understanding of how to design logic that can be reliably implemented. 

Chapter 11, “Programmable Logic Devices,” explains the various logic implementation technolo- 
gies that are used in a digital system. GALs, PALs, CPLDs, and FPGAs are presented from the per- 
spectives of how they work, how they are used to implement arbitrary logic designs, and the 
capabilities and features of each that make them suitable for various types of designs. These devices 
represent the glue that holds some systems together and the core operational elements of others. This 
chapter aids in deciding which technology is best suited to each logic application and how to select 
the right device to suit a specific need. 

PART 3 ANALOG BASICS FOR DIGITAL SYSTEMS 



All electrical systems are collections of analog circuits, but digital systems masquerade as discrete bi- 
nary entities when they are properly designed. It is necessary to understand certain fundamental top- 
ics in circuit analysis so that digital circuits can be made to behave in the intended binary manner. 
Part 3 addresses many essential analog topics that have direct relevance to designing successful digi- 
tal systems. Many digital engineers shrink away from basic DC and AC circuit analysis either for fear 
of higher mathematics or because it is not their area of expertise. This needn’t be the case, because 
most day-to-day analysis required for digital systems can be performed with basic algebra. Further- 
more, a digital systems slant on analog electronics enables many simplifications that are not possible 
in full-blown analog design. On completing this portion of the book, you will be able to apply passive 
components, discrete diodes and transistors, and op-amps in ways that support digital circuits. 

Chapter 12, “Electrical Fundamentals,” addresses basic DC and AC circuit analysis. Resistors, ca- 
pacitors, inductors, and transformers are explained with straightforward means of determining volt- 
ages and currents in simple analog circuits. Nonideal characteristics of passive components are 
discussed, which is a critical aspect of modern, high-speed digital systems. Many a digital system 
has failed because its designers were unaware of increasingly nonideal behavior of components as 
operating frequencies get higher. Frequency-domain analysis and basic filtering are presented to ex- 
plain common analog structures and how they can be applied to digital systems, especially in mini- 
mizing noise, a major contributor to transient and hard-to-detect problems. 

Chapter 13, "Diodes and Transistors,” explains the basic workings of discrete semiconductors and 
provides specific and fully analyzed examples of how they are easily applied to digital applications. 
LEDs are covered as well as bipolar and MOS transistors. An understanding of how diodes and tran- 
sistors function opens up a great field of possible solutions to design problems. Diodes are essential 
in power-regulation circuits and serve as voltage references. Transistors enable electrical loads to be 
driven that are otherwise too heavy for a digital logic chip to handle. 

Chapter 14, “Operational Amplifiers,” discusses this versatile analog building block with many 
practical applications in digital systems. The design of basic amplifiers and voltage comparators is 
offered with many examples to illustrate all topics presented. All examples are thoroughly analyzed 
in a step-by-step process so that you can learn to use op-amps effectively on your own. Op-amps are 
useful in data acquisition and interface circuits, power supply and voltage monitoring circuits, and 
for implementing basic amplifiers and filters. This chapter applies the basic AC analysis skills ex- 
plained previously in designing hybrid analog/digital circuits to support a larger digital system. 
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Chapter 15, “Analog Interfaces for Digital Systems” covers the basics of analog-to-digital and 
digital-to-analog conversion techniques. Many digital systems interact with real-world stimuli in- 
cluding audio, video, and radio frequencies. Data conversion is a key portion of these systems, en- 
abling continuous analog signals to be represented and processed as binary numbers. Several 
common means of performing data conversion are discussed along with fundamental concepts such 
as the Nyquist frequency and anti-alias filtering. 

PART 4 DIGITAL SYSTEM DESIGN IN PRACTICE 



When starting to design a new digital system, high-profile features such as the microprocessor and 
memory architecture often get most of the attention. Yet there are essential support elements that 
may be overlooked by those unfamiliar with them and unaware of the consequences of not taking 
time to address necessary details. All too often, digital engineers end up with systems that almost 
work. A microprocessor may work properly for a few hours and then quit. A data link may work fine 
one day and then experience inexplicable bit errors the next day. Sometimes these problems are the 
result of logic bugs, but mysterious behavior may be related to a more fundamental electrical flaw. 
The final part of this book explains the supporting infrastructure and electrical phenomena that must 
be understood to design and build reliable systems. 

Chapter 16, “Clock Distribution,” explores an essential component of all digital systems: proper 
generation and distribution of clocks. Many common clock generation and distribution methods are 
presented with detailed circuit implementation examples including low-skew buffers, termination, 
and PLLs. Related subjects, including frequency synthesis, DLLs, and source-synchronous clock- 
ing, are presented to lend a broad perspective on system-level clocking strategies. 

Chapter 17, “Voltage Regulation and Power Distribution" discusses the fundamental power infra- 
structure necessary for system operation. An introduction to general power handling is provided that 
covers issues such as circuit specifications and safety issues. Thermal analysis is emphasized for 
safety and reliability concerns. Basic regulator design with discrete components and integrated cir- 
cuits is explained with numerous illustrative circuits for each topic. The remainder of the chapter ad- 
dresses power distribution topics including wiring, circuit board power planes, and power supply 
decoupling capacitors. 

Chapter 18, “Signal Integrity,” delves into a set of topics that addresses the nonideal behavior of 
high-speed digital signals. The first half of this chapter covers phenomena that are common causes 
of corrupted digital signals. Transmission lines, signal reflections, crosstalk, and a wide variety of 
termination schemes are explained. These topics provide a basic understanding of what can go 
wrong and how circuits and systems can be designed to avoid signal integrity problems. Electromag- 
netic radiation, grounding, and static discharge are closely related subjects that are presented in the 
second half of the chapter. An overview is presented of the problems that can arise and their possible 
solutions. Examples illustrate concepts that apply to both circuit board design and overall system en- 
closure design — two equally important matters for consideration. 

Chapter 19, “Designing for Success,” explores a wide range of system-level considerations that 
should be taken into account during the product definition and design phases of a project. Compo- 
nent selection and circuit fabrication must complement the product requirements and available de- 
velopment and manufacturing resources. Often considered mundane, these topics are discussed 
because a successful outcome hinges on the availability and practicality of parts and technologies 
that are designed into a system. System testability is emphasized in this chapter from several per- 
spectives, because testing is prominent in several phases of product development. Test mechanisms 
including boundary scan (JTAG), specific hardware features, and software diagnostic routines en- 
able more efficient debugging and fault isolation in both laboratory and assembly line environments. 
Common computer-aided design software for digital systems is presented with an emphasis on Spice 




analog circuit simulation. Spice applications are covered and augmented by complete examples that 
start with circuits, proceed with Spice modeling, and end with Spice simulation result analysis. The 
chapter closes with a brief overview of common test equipment that is beneficial in debugging and 
characterizing digital systems. 

Following the main text is Appendix A, a brief list of recommended resources for further reading 
and self-education. Modern resources range from books to trade journals and magazines to web sites. 

Many specific vendors and products are mentioned throughout this book to serve as examples and 
as starting points for your exploration. However, there are many more companies and products than 
can be practically listed in a single text. Do not hesitate to search out and consider manufacturers not 
mentioned here, because the ideal component for your application might otherwise lie undiscovered. 
When specific components are described in this book, they are described in the context of the discus- 
sion at hand. References to component specifications cannot substitute for a vendor’s data sheet, be- 
cause there is not enough room to exhaustively list all of a component’s attributes, and such 
specifications are always subject to revision by the manufacturer. Be sure to contact the manufac- 
turer of a desired component to get the latest information on that product. Component manufacturers 
have a vested interest in providing you with the necessary information to use their products in a safe 
and effective manner. It is wise to take advantage of the resources that they offer. The widespread 
use of the Internet has greatly simplified this task. 

True proficiency in a trade comes with time and practice. There is no substitute for experience or 
mentoring from more senior engineers. However, help in acquiring this experience by being pointed 
in the right direction can not only speed up the learning process, it can make it more enjoyable as 
well. With the right guide, a motivated beginner’s efforts can be more effectively channeled through 
the early adoption of sound design practices and knowing where to look for necessary information. I 
sincerely hope that this book can be your guide, and I wish you the best of luck in your endeavors. 



Mark Balch 
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CHAPTER 1 

Digital Logic 



All digital systems are founded on logic design. Logic design transforms algorithms and processes 
conceived by people into computing machines. A grasp of digital logic is crucial to the understand- 
ing of other basic elements of digital systems, including microprocessors. This chapter addresses vi- 
tal topics ranging from Boolean algebra to synchronous logic to timing analysis with the goal of 
providing a working set of knowledge that is the prerequisite for learning how to design and imple- 
ment an unbounded range of digital systems. 

Boolean algebra is the mathematical basis for logic design and establishes the means by which a 
task’s defining rules are represented digitally. The topic is introduced in stages starting with basic 
logical operations and progressing through the design and manipulation of logic equations. Binary 
and hexadecimal numbering and arithmetic are discussed to explain how logic elements accomplish 
significant and practical tasks. 

With an understanding of how basic logical relationships are established and implemented, the 
discussion moves on to explain flip-flops and synchronous logic design. Synchronous logic comple- 
ments Boolean algebra, because it allows logic operations to store and manipulate data over time. 
Digital systems would be impossible without a deterministic means of advancing through an algo- 
rithm’s sequential steps. Boolean algebra defines algorithmic steps, and the progression between 
steps is enabled by synchronous logic. 

Synchronous logic brings time into play along with the associated issue of how fast a circuit can 
reliably operate. Logic elements are constructed using real electrical components, each of which has 
physical requirements that must be satisfied for proper operation. Timing analysis is discussed as a 
basic part of logic design, because it quantifies the requirements of real components and thereby es- 
tablishes a digital circuit’s practical operating conditions. 

The chapter concludes with a presentation of higher-level logic constructs that are built up from 
the basic logic elements already discussed. These elements, including multiplexers, tri-state buffers, 
and shift registers, are considered to be fundamental building blocks in digital system design. The 
remainder of this book, and digital engineering as a discipline, builds on and makes frequent refer- 
ence to the fundamental items included in this discussion. 



1.1 BOOLEAN LOGIC 



Machines of all types, including computers, are designed to perform specific tasks in exact well de- 
fined manners. Some machine components are purely physical in nature, because their composition 
and behavior are strictly regulated by chemical, thermodynamic, and physical properties. For exam- 
ple, an engine is designed to transform the energy released by the combustion of gasoline and oxy- 
gen into rotating a crankshaft. Other machine components are algorithmic in nature, because their 
designs primarily follow constraints necessary to implement a set of logical functions as defined by 

3 



Copyright 2003 by The McGraw-Hill Companies, Inc. Click Here for Terms of Use. 





4 Digital Fundamentals 



human beings rather than the laws of physics. A traffic light’s behavior is predominantly defined by 
human beings rather than by natural physical laws. This book is concerned with the design of digital 
systems that are suited to the algorithmic requirements of their particular range of applications. Dig- 
ital logic and arithmetic are critical building blocks in constructing such systems. 

An algorithm is a procedure for solving a problem through a series of finite and specific steps. It 
can be represented as a set of mathematical formulas, lists of sequential operations, or any combina- 
tion thereof. Each of these finite steps can be represented by a Boolean logic equation. Boolean logic 
is a branch of mathematics that was discovered in the nineteenth century by an English mathemati- 
cian named George Boole. The basic theory is that logical relationships can be modeled by algebraic 
equations. Rather than using arithmetic operations such as addition and subtraction. Boolean algebra 
employs logical operations including AND, OR, and NOT. Boolean variables have two enumerated 
values: true and false, represented numerically as 1 and 0, respectively. 

The AND operation is mathematically defined as the product of two Boolean values, denoted A 
and B for reference. Truth tables are often used to illustrate logical relationships as shown for the 
AND operation in Table 1.1. A truth table provides a direct mapping between the possible inputs and 
outputs. A basic AND operation has two inputs with four possible combinations, because each input 
can be either 1 or 0 — true or false. Mathematical rules apply to Boolean algebra, resulting in a non- 
zero product only when both inputs are 1 . 



TABLE 1.1 AND Operation Truth Table 



A 


B 


A AND B 


0 


0 


0 


0 


1 


0 


1 


0 


0 


1 


1 


1 



Summation is represented by the OR operation in Boolean algebra as shown in Table 1.2. Only 
one combination of inputs to the OR operation result in a zero sum: 0 + 0 = 0. 

TABLE 1.2 OR Operation Truth Table 



A B A OR B 



0 


0 


0 


0 


1 


1 


1 


0 


1 


1 


1 


1 



AND and OR are referred to as binary operators, because they require two operands. NOT is a 
unary operator , meaning that it requires only one operand. The NOT operator returns the comple- 
ment of the input: 1 becomes 0, and 0 becomes 1. When a variable is passed through a NOT opera- 
tor, it is said to be inverted. 
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Boolean variables may not seem too interesting on their own. It is what they can be made to rep- 
resent that leads to useful constructs. A rather contrived example can be made from the following 
logical statement: 

“If today is Saturday or Sunday and it is warm, then put on shorts.” 

Three Boolean inputs can be inferred from this statement: Saturday, Sunday, and warm. One Bool- 
ean output can be inferred: shorts. These four variables can be assembled into a single logic equation 
that computes the desired result, 

shorts = (Saturday OR Sunday) AND warm 

While this is a simple example, it is representative of the fact that any logical relationship can be ex- 
pressed algebraically with products and sums by combining the basic logic functions AND, OR, and 
NOT. 

Several other logic functions are regarded as elemental, even though they can be broken down 
into AND, OR, and NOT functions. These are not-AND (NAND), not-OR (NOR), exclusive-OR 
(XOR), and exclusive-NOR (XNOR). Table 1.3 presents the logical definitions of these other basic 
functions. XOR is an interesting function, because it implements a sum that is distinct from OR by 
taking into account that 1 + 1 does not equal 1 . As will be seen later, XOR plays a key role in arith- 
metic for this reason. 



TABLE 1.3 NAND, NOR, XOR, XNOR Truth Table 



A 


B 


A NAND B 


ANORB 


AXORB 


A XNOR B 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


1 


0 


1 


0 


1 


0 


1 


0 


1 


1 


0 


0 


0 


1 



All binary operators can be chained together to implement a wide function of any number of in- 
puts. For example, the truth table for a ten-input AND function would result in a 1 output only when 
all inputs are 1 . Similarly, the truth table for a seven-input OR function would result in a 1 output if 
any of the seven inputs are 1. A four-input XOR, however, will only result in a 1 output if there are 
an odd number of ones at the inputs. This is because of the logical daisy chaining of multiple binary 
XOR operations. As shown in Table 1.3, an even number of Is presented to an XOR function cancel 
each other out. 

It quickly grows unwieldy to write out the names of logical operators. Concise algebraic expres- 
sions are written by using the graphical representations shown in Table 1.4. Note that each operation 
has multiple symbolic representations. The choice of representation is a matter of style when hand- 
written and is predetermined when programming a computer by the syntactical requirements of each 
computer programming language. 

A common means of representing the output of a generic logical function is with the variable Y. 
Therefore, the AND function of two variables, A and B, can be written asY=A&BorY = A*B. As 
with normal mathematical notation, products can also be written by placing terms right next to each 
other, such as Y = AB. Notation for the inverted functions, NAND, NOR, and XNOR, is achieved by 
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TABLE 1.4 Symbolic Representations of 
Standard Boolean Operators 



Boolean Operation 


Operators 


AND 


*, & 


OR 


+, i,# 


XOR 


©, a 


NOT 


!,~, A 



inverting the base function. Two equally valid ways of representing NAND are Y = A & B and Y = 
!(AB). Similarly, an XNOR might be written as Y = A © B. 

When logical functions are converted into circuits, graphical representations of the seven basic 
operators are commonly used. In circuit terminology, the logical operators are called gates. Figure 
1.1 shows how the basic logic gates are drawn on a circuit diagram. Naming the inputs of each gate 
A and B and the output Y is for reference only; any name can be chosen for convenience. A small 
bubble is drawn at a gate’s output to indicate a logical inversion. 

More complex Boolean functions are created by combining Boolean operators in the same way 
that arithmetic operators are combined in normal mathematics. Parentheses are useful to explicitly 
convey precedence information so that there is no ambiguity over how two variables should be 
treated. A Boolean function might be written as 

Y = (AB + C + D)&E © F 

This same equation could be represented graphically in a circuit diagram, also called a schematic 
diagram , as shown in Fig. 1.2. This representation uses only two-input logic gates. As already men- 
tioned, binary operators can be chained together to implement functions of more than two variables. 



t> Y 


b^> y 


CD > 
-< 




AND 


OR 


XOR 


>^Y 


D- 

NAND 


b5> y 

NOR 


b£> y 

XNOR 


NOT 



FIGURE 1.1 Graphical representation of basic logic gates. 




FIGURE 1.2 Schematic diagram of logic function. 
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An alternative graphical representation would use a three-input OR gate by collapsing the two-input 
OR gates into a single entity. 



1.2 BOOLEAN MANIPULATION 



Boolean equations are invaluable when designing digital logic. To properly use and devise such 
equations, it is helpful to understand certain basic rules that enable simplification and re-expression 
of Boolean logic. Simplification is perhaps the most practical final result of Boolean manipulation, 
because it is easier and less expensive to build a circuit that does not contain unnecessary compo- 
nents. When a logical relationship is first set down on paper, it often is not in its most simplified 
form. Such a circuit will function but may be unnecessarily complex. Re-expression of a Boolean 
equation is a useful skill, because it can enable you to take better advantage of the logic resources at 
your disposal instead of always having to use new components each time the logic is expanded or 
otherwise modified to function in a different manner. As will soon be shown, an OR gate can be 
made to behave as an AND gate, and vice versa. Such knowledge can enable you to build a less- 
complex implementation of a Boolean equation. 

First, it is useful to mention two basic identities: 

A & A = 0 and A + A = 1 

The first identity states that the product of any variable and its logical negation must always be false. 
It has already been shown that both operands of an AND function must be true for the result to be 
true. Therefore, the first identity holds true, because it is impossible for both operands to be true 
when one is the negation of the other. The second identity states that the sum of any variable and its 
logical negation must always be true. At least one operand of an OR function must be true for the re- 
sult to be true. As with the first identity, it is guaranteed that one operand will be true, and the other 
will be false. 

Boolean algebra also has commutative, associative, and distributive properties as listed below: 

• Commutative: A & B = B & A and A + B = B + A 

• Associative: (A & B) & C = A & (B & C) and (A + B) + C = A + (B + C) 

• Distributive: A&(B + C)=A&B+A&C 

The aforementioned identities, combined with these basic properties, can be used to simplify logic. 
For example. 



A&B&C+A&B&C 

can be re-expressed using the distributive property as 

A&B&(C + C) 



which we know by identity equals 



A&B&(1)=A&B 

Another useful identity, A + AB = A + B, can be illustrated using the truth table shown in 
Table 1.5. 
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TABLE 1.5 A + AB = A + B Truth Table 



A 


B 


AB 


A + AB 


A + B 


0 


0 


0 


0 


0 


0 


1 


1 


1 


1 


1 


0 


0 


1 


1 


1 


1 


0 


1 


1 



Augustus DeMorgan, another nineteenth century English mathematician, worked out a logical 
transformation that is known as DeMorgan’s law, which has great utility in simplifying and re-ex- 
pressing Boolean equations. Simply put, DeMorgan's law states 



A + B = A&B and A&B = A + B 

These transformations are very useful, because they show the direct equivalence of AND and OR 
functions and how one can be readily converted to the other. XOR and XNOR functions can be rep- 
resented by combining AND and OR gates. It can be observed from Table 1.3 that A © B = AB + AB 
and that A © B = AB + A B. Conversions between XOR/XNOR and AND/OR functions are helpful 
when manipulating and simplifying larger Boolean expressions, because simpler AND and OR func- 
tions are directly handled with DeMorgan's law, whereas XOR/XNOR functions are not. 



1.3 THE KARNAUGH MAP 



Generating Boolean equations to implement a desired logic function is a necessary step before a cir- 
cuit can be implemented. Truth tables are a common means of describing logical relationships be- 
tween Boolean inputs and outputs. Once a truth table has been created, it is not always easy to 
convert that truth table directly into a Boolean equation. This translation becomes more difficult as 
the number of variables in a function increases. A graphical means of translating a truth table into a 
logic equation was invented by Maurice Karnaugh in the early 1950s and today is called the Kar- 
naugh map , or K-map. A K-map is a type of truth table drawn such that individual product terms can 
be picked out and summed with other product terms extracted from the map to yield an overall Bool- 
ean equation. The best way to explain how this process works is through an example. Consider the 
hypothetical logical relationship in Table 1.6. 



TABLE 1.6 Function of Three Variables 



A 


B 


c 


Y 


0 


0 


0 


1 


0 


0 


1 


1 


0 


1 


0 


0 


0 


1 


1 


1 


1 


0 


0 


1 


1 


0 


1 


1 


1 


1 


0 


0 


1 


1 


1 


0 
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If the corresponding Boolean equation does not immediately become clear, the truth table can be 
converted into a K-map as shown in Fig. 1.3. The K-map has one box for every combination of in- 
puts, and the desired output for a given combination is written into the corresponding box. Each axis 
of a K-map represents up to two variables, enabling a K-map to solve a function of up to four vari- 
ables. Individual grid locations on each axis are labeled with a unique combination of the variables 
represented on that axis. The labeling pattern is important, because only one variable per axis is per- 
mitted to differ between adjacent boxes. Therefore, the pattern “00, 01, 10, 1 1” is not proper, but the 
pattern “11, 01, 00, 10” would work as well as the pattern shown. 

K-maps are solved using the sum of products principle, which states that any relationship can be 
expressed by the logical OR of one or more AND terms. Product terms in a K-map are recognized by 
picking out groups of adjacent boxes that all have a state of 1. The simplest product term is a single 
box with a 1 in it, and that term is the product of all variables in the K-map with each variable either 
inverted or not inverted such that the result is 1 . For example, a 1 is observed in the box that corre- 
sponds to A = 0. B = 1. and C = 1. The product term representation of that box would be ABC. A 
brute force solution is to sum together as many product terms as there are boxes with a state of 1 
(there are five in this example) and then simplify the resulting equation to obtain the final result. This 
approach can be taken without going to the trouble of drawing a K-map. The purpose of a K-map is 
to help in identifying minimized product terms so that lengthy simplification steps are unnecessary. 

Minimized product terms are identified by grouping together as many adjacent boxes with a state 
of 1 as possible, subject to the rules of Boolean algebra. Keep in mind that, to generate a valid prod- 
uct term, all boxes in a group must have an identical relationship to all of the equation’s input vari- 
ables. This requirement translates into a rule that product term groups must be found in power-of- 
two quantities. For a three-variable K-map, product term groups can have only 1, 2. 4, or 8 boxes in 
them. 

Going back to our example, a four-box product term is formed by grouping together the vertically 
stacked Is on the left and right edges of the K-map. An interesting aspect of a K-map is that an edge 
wraps around to the other side, because the axis labeling pattern remains continuous. The validity of 
this wrapping concept is shown by the fact that all four boxes share a common relationship with the 
input variables: their product term is B. The other variables, A and C, can be ruled out, because the 
boxes are 1 regardless of the state of A and C. Only variable B is a determining factor, and it must be 
0 for the boxes to have a state of 1 . Once a product term has been identified, it is marked by drawing 
a ring around it as shown in Fig. 1.4. Because the product term crosses the edges of the table, half- 
rings are shown in the appropriate locations. 

There is still a box with a 1 in it that has not yet been accounted for. One approach could be to 
generate a product term for that single box, but this would not result in a fully simplified equation, 
because a larger group can be formed by associating the lone box with the adjacent box correspond- 
ing to A = 0, B = 0, and C = 1. K-map boxes can be part of multiple groups, and forming the largest 
groups possible results in a fully simplified equation. This second group of boxes is circled in Fig. 
1.5 to complete the map. This product term shares a common relationship where A = 0, C = 1, and B 



V A.B 

c\ 00 


01 


ii 


10 


0 


1 


0 


0 


1 


1 


1 


1 


0 


1 



vA,B 

c\ 00 01 11 10 





0 


0 


£T 




1 


0 


v1_ 



FIGURE 1.3 Karnaugh map for function of FIGURE 1.4 Partially completed Karnaugh map 
three variables. for a function of three variables. 
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is irrelevant: AC . It may appear tempting to create a product term consisting of the three boxes on 
the bottom edge of the K-map. This is not valid because it does not result in all boxes sharing a com- 
mon product relationship, and therefore violates the power-of-two rule mentioned previously. Upon 
completing the K-map, all product terms are summed to yield a final and simplified Boolean equa- 
tion that relates the input variables and the output: Y = B + AC . 

Functions of four variables are just as easy to solve using a K-map. Beyond four variables, it is 
preferable to break complex functions into smaller subfunctions and then combine the Boolean 
equations once they have been determined. Figure 1.6 shows an example of a completed Karnaugh 
map for a hypothetical function of four variables. Note the overlap between several groups to 
achieve a simplified set of product terms. The lager a group is, the fewer unique terms will be re- 
quired to represent its logic. There is nothing to lose and something to gain by forming a larger 
group whenever possible. This K-map has four product terms that are summed for a final result: 
Y = A C + B C + ABD + ABCD . 

In both preceding examples, each result box in the truth table and Karnaugh map had a clearly de- 
fined state. Some logical relationships, however, do not require that every possible result necessarily 
be a one or a zero. For example, out of 16 possible results from the combination of four variables, 
only 14 results may be mandated by the application. This may sound odd, but one explanation could 
be that the particular application simply cannot provide the full 16 combinations of inputs. The spe- 
cific reasons for this are as numerous as the many different applications that exist. In such circum- 
stances these so-called don’t care results can be used to reduce the complexity of your logic. 
Because the application does not care what result is generated for these few combinations, you can 
arbitrarily set the results to Os or Is so that the logic is minimized. Figure 1.7 is an example that 
modifies the Karnaugh map in Fig. 1 .6 such that two don’t care boxes are present. Don't care values 
are most commonly represented with “x” characters. The presence of one x enables simplification of 
the resulting logic by converting it to a 1 and grouping it with an adjacent 1. The other x is set to 0 so 
that it does not waste additional logic terms. The new Boolean equation is simplified by removing B 
from the last term, yielding Y = AC + BC + ABD + ACD . It is helpful to remember that x val- 
ues can generally work to your benefit, because their presence imposes fewer requirements on the 
logic that you must create to get the job done. 



1.4 BINARY AND HEXADECIMAL NUMBERING 



The fact that there are only two valid Boolean values, 1 and 0, makes the binary numbering system 
appropriate for logical expression and, therefore, for digital systems. Binary is a base-2 system in 





FIGURE 1.5 Completed Karnaugh map for a 
function of three variables. 



FIGURE 1.6 Completed Karnaugh map for 
function of four variables. 
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FIGURE 1.7 Karnaugh map for function of four vari- 
ables with two “don’t care” values. 



which only the digits 1 and 0 exist. Binary follows the same laws of mathematics as decimal, or 
base-10, numbering. In decimal, the number 191 is understood to mean one hundreds plus nine tens 
plus one ones. It has this meaning, because each digit represents a successively higher power of ten 
as it moves farther left of the decimal point. Representing 191 in mathematical terms to illustrate 
these increasing powers of ten can be done as follows: 

191 = 1 x 10 2 + 9 x 10 1 + 1 x 10° 



Binary follows the same rule, but instead of powers of ten, it works on powers of two. The num- 
ber 110 in binary (written as 110 2 to explicitly denote base 2) does not equal 110 10 (decimal). 
Rather, 110 2 =lx2 2 +lx2 1 + 0x2 0 = 6 10 . The number 191 10 can be converted to binary by per- 
forming successive division by decreasing powers of 2 as shown below: 



191 +2 7 


= 191 + 128 


= 1 remainder 


63 + 2 6 


= 63 + 64 


= 0 remainder 


63 + 2 5 


= 63 + 32 


= 1 remainder 


31 + 2 4 


= 31 + 16 


= 1 remainder 


15 + 2 3 


= 15 + 8 


= 1 remainder 


7 + 2 2 


= 7 + 4 


= 1 remainder 


3 + 2 1 


= 3 + 2 


= 1 remainder 


1 +2° 


= 1 + 1 


= 1 remainder 



The final result is that 19 1 10 = 101 1111 1 2 . Each binary digit is referred to as a bit. A group of N 
bits can represent decimal numbers from 0 to 2 N - 1 . There are eight bits in a byte, more formally 
called an octet in certain circles, enabling a byte to represent numbers up to 2 8 - 1 = 255. The pre- 
ceding example shows the eight power-of-two terms in a byte. If each term, or bit, has its maximum 
value of 1, the result is 128 + 64 + 32 + 16 + 8 + 4 + 2+1= 255. 

While binary notation directly represents digital logic states, it is rather cumbersome to work 
with, because one quickly ends up with long strings of ones and zeroes. Hexadecimal, or base 16 
(hex for short), is a convenient means of representing binary numbers in a more succinct notation. 
Hex matches up very well with binary, because one hex digit represents four binary digits, given that 
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2 4 = 16. A four-bit group is called a nibble. Because hex requires 16 digits, the letters “A” through 
“F” are borrowed for use as hex digits beyond 9. The 16 hex digits are defined in Table 1.7. 



TABLE 1.7 Hexadecimal Digits 



Decimal value 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


Hex digit 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


A 


B 


C 


D 


E 


F 


Binary nibble 


0000 


0001 


0010 


0011 


0100 


0101 


0110 


0111 


1000 


1001 


1010 


1011 


1100 


1101 


1110 


mi 



The preceding example. 191 10 = 1011111 1 2 , can be converted to hex easily by grouping the eight 
bits into two nibbles and representing each nibble with a single hex digit: 

101 1 2 = (8 + 2 + 1) 10 = 11 10 = B 16 
HI 1 2 = (8 + 4 + 2+ 1) 10 = 15 10 = F 16 

Therefore, 19 1 10 = 101 1 1 1 1 1 7 = BF 16 . There are two common prefixes. Ox and $, and a common 
suffix, h, that indicate hex numbers. These styles are used as follows: BF 16 = OxBF = $BF = BFh. All 
three are used by engineers, because they are more convenient than appending a subscript “16” to 
each number in a document or computer program. Choosing one style over another is a matter of 
preference. 

Whether a number is written using binary or hex notation, it remains a string of bits, each of 
which is 1 or 0. Binary numbering allows arbitrary data processing algorithms to be reduced to 
Boolean equations and implemented with logic gates. Consider the equality comparison of two four- 
bit numbers, M and N. 



“If M = N, then the equality test is true.” 

Implementing this function in gates first requires a means of representing the individual bits that 
compose M and N. When a group of bits are used to represent a common entity, the bits are num- 
bered in ascending or descending order with zero usually being the smallest index. The bit that rep- 
resents 2° is termed the least-significant bit, or LSB, and the bit that represents the highest power of 
two in the group is called the most-significant bit, or MSB. A four-bit quantity would have the MSB 
represent 2 3 . M and N can be ordered such that the MSB is bit number 3, and the LSB is bit number 
0. Collectively, M and N may be represented as M[3:0] and N[3:0] to denote that each contains four 
bits with indices from 0 to 3. This presentation style allows any arbitrary bit of M or N to be 
uniquely identified with its index. 

Turning back to the equality test, one could derive the Boolean equation using a variety of tech- 
niques. Equality testing is straightforward, because M and N are equal only if each digit in M 
matches its corresponding bit position in N. Looking back to Table 1.3, it can be seen that the XNOR 
gate implements a single-bit equality check. Each pair of bits, one from M and one from N, can be 
passed through an XNOR gate, and then the four individual equality tests can be combined with an 
AND gate to determine overall equality. 



Y = M[3] ©N[3]&M[2] ©N[2]&M[1] ©N[1]&M[0] ©N[0] 



The four-bit equality test can be drawn schematically as shown in Fig. 1.8. 
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Logic to compare one number against a constant is simpler than comparing two numbers, because 
the number of inputs to the Boolean equation is cut in half. If, for example, one wanted to compare 
M[3:0] to a constant 1001 2 (9 10 ), the logic would reduce to just a four-input AND gate with two in- 
verted inputs: 



y = M[3]&M[2]&M[1]&M[0] 

When working with computers and other digital systems, numbers are almost always written in 
hex notation simply because it is far easier to work with fewer digits. In a 32-bit computer, a value 
can be written as either 8 hex digits or 32 bits. The computer’s logic always operates on raw binary 
quantities, but people generally find it easier to work in hex. An interesting historical note is that hex 
was not always the common method of choice for representing bits. In the early days of computing, 
through the 1960s and 1970s, octal (base-8) was used predominantly. Instead of a single hex digit 
representing four bits, a single octal digit represents three bits, because 2 3 = 8. In octal, 191 10 = 
277 8 . Whereas bytes are the lingua franca of modern computing, groups of two or three octal digits 
were common in earlier times. 

Because of the inherent binary nature of digital systems, quantities are most often expressed in or- 
ders of magnitude that are tied to binary rather than decimal numbering. For example, a “round num- 
ber” of bytes would be 1,024 (2 10 ) rather than 1000 (10 3 ). Succinct terminology in reference to 
quantities of data is enabled by a set of standard prefixes used to denote order of magnitude. Further- 
more, there is a convention of using a capital B to represent a quantity of bytes and using a lower- 
case b to represent a quantity of bits. Commonly observed prefixes used to quantify sets of data are 
listed in Table 1.8. Many memory chips and communications interfaces are expressed in units of 
bits. One must be careful not to misunderstand a specification. If you need to store 32 MB of data, be 
sure to use a 256 Mb memory chip rather than a 32 Mb device! 



TABLE 1.8 Common Binary Magnitude Prefixes 



Prefix 


Definition 


Order of Magnitude 


Abbreviation 


Usage 


Kilo 


(1,024/ = 1,024 


2 10 


k 


kB 


Mega 


(1,024) 2 = 1,048,576 


2 20 


M 


MB 


Giga 


(1,0241 3 = 1,073,741,824 


2 30 


G 


GB 


Tera 


(1,024) 4 = 1,099,511,627,776 


2 40 


T 


TB 


Peta 


( 1 ,024 J 5 = 1,125,899,906,842,624 


2 50 


P 


PB 


Exa 


(1,024) 6 = 1,152,921,504,606,846,976 


to 

CT\ 

O 


E 


EB 
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The majority of digital components adhere to power-of-two magnitude definitions. However, 
some industries break from these conventions largely for reasons of product promotion. A key exam- 
ple is the hard disk drive industry, which specifies prefixes in decimal terms (e.g., 1 MB = 1,000,000 
bytes). The advantage of doing this is to inflate the apparent capacity of the disk drive: a drive that 
provides 10,000,000,000 bytes of storage can be labeled as “10 GB” in decimal terms, but it would 
have to be labeled as only 9.31 GB in binary terms (10 10 + 2 30 = 9.31). 



1.5 BINARY ADDITION 



Despite the fact that most engineers use hex data representation, it has already been shown that logic 
gates operate on strings of bits that compose each unit of data. Binary arithmetic is performed ac- 
cording to the same rules as decimal arithmetic. When adding two numbers, each column of digits is 
added in sequence from right to left and, if the sum of any column is greater than the value of the 
highest digit, a carry is added to the next column. In binary, the largest digit is 1, so any sum greater 
than 1 will result in a carry. The addition of 1 1 1 2 and 0 1 1 2 (7 + 3 = 10) is illustrated below. 



1110 carry bits 

1 I 1 
+ 0 11 



10 10 



In the first column, the sum of two ones is 2 10 , or 10 2 , resulting in a carry to the second column. 
The sum of the second column is 3 10 , or 11 2 , resulting in both a carry to the next column and a one 
in the sum. When all three columns are completed, a carry remains, having been pushed into a new 
fourth column. The carry is, in effect, added to leading 0s and descends to the sum line as a 1. 

The logic to perform binary addition is actually not very complicated. At the heart of a 1-bit adder 
is the XOR gate, whose result is the sum of two bits without the associated carry bit. An XOR gate 
generates a 1 when either input is 1, but not both. On its own, the XOR gate properly adds 0 + 0, 0 + 
1, and 1+0. The fourth possibility, 1 + 1=2, requires a carry bit, because 2 10 = 10 2 . Given that a 
carry is generated only when both inputs are 1, an AND gate can be used to produce the carry. A so- 
called half-adder is represented as follows: 



sum = A © B 
carry = AB 

This logic is called a half-adder because it does only part of the job when multiple bits must be 
added together. Summing multibit data values requires a carry to ripple across the bit positions start- 
ing from the LSB. The half-adder has no provision for a carry input from the preceding bit position. 
A full -adder incorporates a carry input and can therefore be used to implement a complete summa- 
tion circuit for an arbitrarily large pair of numbers. Table 1.9 lists the complete full-adder input/out- 
put relationship with a carry input (C IN ) from the previous bit position and a carry output (C 0UT ) to 
the next bit position. Note that all possible sums from zero to three are properly accounted for by 
combining C oux and sum. When C IN = 0, the circuit behaves exactly like the half-adder. 
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TABLE 1.9 1 -Bit Full-Adder Truth Table 



ClN 


A 


B 


Cout 


Sum 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 


0 


1 


1 


1 


0 


1 


0 


0 


0 


1 


1 


0 


1 


1 


0 


1 


1 


0 


1 


0 


1 


1 


1 


1 


1 



Full-adder logic can be expressed in a variety of ways. It may be recognized that full-adder logic 
can be implemented by connecting two half-adders in sequence as shown in Fig. 1.9. This full-adder 
directly generates a sum by computing the XOR of all three inputs. The carry is obtained by combin- 
ing the carry from each addition stage. A logical OR is sufficient for C 0UT , because there can never 
be a case in which both half-adders generate a carry at the same time. If the A + B half-adder gener- 
ates a carry, the partial sum will be 0. making a carry from the second half-adder impossible. The as- 
sociated logic equations are as follows: 

sum = A © B © C IN 
C OUT = AB + [(A©B)C in ] 



Equivalent logic, although in different form, would be obtained using a K-map, because XOR/ 
XNOR functions are not direct results of K-map AND/OR solutions. 



1.6 SUBTRACTION AND NEGATIVE NUMBERS 



Binary subtraction is closely related to addition. As with many operations, subtraction can be imple- 
mented in a variety of ways. It is possible to derive a Boolean equation that directly subtracts two 
numbers. However, an efficient solution is to add the negative of the subtrahend to the minuend 




FIGURE 1 .9 Full-adder logic diagram. 
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rather than directly subtracting the subtrahend from the minuend. These are, of course, identical op- 
erations: A - B = A + (-B). This type of arithmetic is referred to as subtraction by addition of the 
two’s complement. The two’s complement is the negative representation of a number that allows the 
identity A - B = A + (-B ) to hold true. 

Subtraction requires a means of expressing negative numbers. To this end, the most-significant 
bit, or left-most bit, of a binary number is used as the sign-bit when dealing with signed numbers. A 
negative number is indicated when the sign-bit equals 1. Unsigned arithmetic does not involve a 
sign-bit, and therefore can express larger absolute numbers, because the MSB is merely an extra 
digit rather than a sign indicator. 

The first step in performing two’s complement subtraction is to convert the subtrahend into a neg- 
ative equivalent. This conversion is a two-step process. First, the binary number is inverted to yield a 
one’s complement. Then, 1 is added to the one’s complement version to yield the desired two's com- 
plement number. This is illustrated below: 



0 


1 


0 


1 


Original number (5) 


1 


0 


1 


0 


One’s complement 


+ 0 


0 


0 


1 


Add one 


1 


0 


1 


1 


Two’s complement (-5) 



Observe that the unsigned four-bit number that can represent values from 0 to 15 10 now represents 
signed values from -8 to 7. The range about zero is asymmetrical because of the sign-bit and the fact 
that there is no negative 0. Once the two’s complement has been obtained, subtraction is performed 
by adding the two’s complement subtrahend to the minuend. For example, 7-5 = 2 would be per- 
formed as follows, given the -5 representation obtained above: 



i i 


1 


1 


0 


Carry bits 


0 


1 


1 


1 


Minuend (7) 


+ 1 


0 


1 


1 


“Subtrahend” (-5) 


0 


0 


1 


0 


Result (2) 



Note that the final carry-bit past the sign-bit is ignored. An example of subtraction with a negative 
result is 3 - 5 = -2. 





1 


1 


0 


Carry bits 


0 


0 


1 


1 


Minuend (3) 


+ 1 


0 


1 


1 


“Subtrahend” (-5) 


1 


1 


1 


0 


Result (-2) 



Here, the result has its sign-bit set, indicating a negative quantity. We can check the answer by calcu- 
lating the two’s complement of the negative quantity. 
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1 


1 


1 


0 


Original number (-2) 


0 


0 


0 


1 


One’s complement 


+ 0 


0 


0 


1 


Add one 


0 


0 


1 


0 


Two’s complement (2) 



This check succeeds and shows that two's complement conversions work “both ways,” going back 
and forth between negative and positive numbers. The exception to this rule is the asymmetrical case 
in which the largest negative number is one more than the largest positive number as a result of the 
presence of the sign-bit. A four-bit number, therefore, has no positive counterpart of -8. Similarly, an 
8-bit number has no positive counterpart of -128. 



1. 7 MULTIPLICATION AND DIVISION 



Multiplication and division follow the same mathematical rules used in decimal numbering. How- 
ever, their implementation is substantially more complex as compared to addition and subtraction. 
Multiplication can be performed inside a computer in the same way that a person does so on paper. 
Consider 12 x 12 = 144. 



1 2 
X 1 2 

2 4 Partial product X 10° 

1 2 Partial product X 10 1 

1 4 4 Final product 



The multiplication process grows in steps as the number of digits in each multiplicand increases, 
because the number of partial products increases. Binary numbers function the same way, but there 
easily can be many partial products, because numbers require more digits to represent them in binary 
versus decimal. Here is the same multiplication expressed in binary (1100 x 1 100 = 10010000): 







1 


1 


0 


0 






X 


1 


1 


0 


0 








0 


0 


0 


0 


Partial product X 2° 




0 


0 


0 


0 




Partial product X 2 1 


i 


1 


0 


0 






Partial product X 2 2 


+ i i 


0 


0 








Partial product X 2 3 


1 0 0 


1 


0 


0 


0 


0 


Final product 
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Walking through these partial products takes extra logic and time, which is why multiplication and, 
by extension, division are considered advanced operations that are not nearly as common as addition 
and subtraction. Methods of implementing these functions require trade-offs between logic com- 
plexity and the time required to calculate a final result. 



1.8 FLIP-FLOPS AND LATCHES 



Logic alone does not a system make. Boolean equations provide the means to transform a set of in- 
puts into deterministic results. However, these equations have no ability to store the results of previ- 
ous calculations upon which new calculations can be made. The preceding adder logic continually 
recalculates the sum of two inputs. If either input is removed from the circuit, the sum disappears as 
well. A series of numbers that arrive one at a time cannot be summed, because the adder has no 
means of storing a running total. Digital systems operate by maintaining state to advance through se- 
quential steps in an algorithm. State is the system’s ability to keep a record of its progress in a partic- 
ular sequence of operations. A system’s state can be as simple as a counter or an accumulated sum. 

State-full logic elements called flip-flops are able to indefinitely hold a specific state (0 or 1) until 
a new state is explicitly loaded into them. Flip-flops load a new state when triggered by the transition 
of an input clock. A clock is a repetitive binary signal with a defined period that is composed of 0 
and 1 phases as shown in Fig. 1 . 10. In addition to a defined period, a clock also has a certain duty cy- 
cle , the ratio of the duration of its 0 and 1 phases to the overall period. An ideal clock has a 50/50 
duty cycle, indicating that its period is divided evenly between the two states. Clocks regulate the 
operation of a digital system by allowing time for new results to be calculated by logic gates and 
then capturing the results in flip-flops. 

There are several types of flip-flops, but the most common type in use today is the D flip-flop. 
Other types of flip-flops include RS and JK, but this discussion is restricted to D flip-flops because of 
their standardized usage. A D flip-flop is often called a flop for short, and this terminology is used 
throughout the book. A basic rising-edge triggered flop has two inputs and one output as shown in 
Fig. 1.11a. By convention, the input to a flop is labeled D, the output is labeled Q, and the clock is 
represented graphically by a triangle. When the clock transitions from 0 to 1, the state at the D input 
is propagated to the Q output and stored until the next rising edge. State-full logic is often described 
through the use of a timing diagram, a drawing of logic state versus time. Figure 1.1 lb shows a basic 
flop timing diagram in which the clock’s rising edge triggers a change in the flop's state. Prior to the 
rising edge, the flop has its initial state, Q 0 , and an arbitrary 0 or 1 input is applied as D 0 . The rising 
edge loads D 0 into the flop, which is reflected at the output. Once triggered, the flop’s input can 
change without affecting the output until the next rising edge. Therefore, the input is labeled as 
“don’t care,’’ or “xxx” following the clock’s rising edge. 



Finite transition time of 




FIGURE 1.10 Digital clock signal. 
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Rising-edge flops are the norm, although some flops are falling-edge triggered. A falling-edge 
triggered flop is indicated by placing an inversion bubble at the clock input as shown in Fig. 1.12. 
Operation is the same, with the exception that the polarity of the clock is inverted. The remainder of 
this discussion assumes rising-edge triggered flops unless explicitly stated otherwise. 

There are several common feature enhancements to the basic flop, including clock-enable, set, 
and clear inputs and a complementary output. Clock enable is used as a triggering qualifier each 
time a rising clock edge is detected. The D input is loaded only if clock enable is set to its active 
state. Inputs in general are defined by device manufacturers to be either active-low or active-high. An 
active-low signal is effective when set to 0, and an active-high signal is effective when set to 1 . Sig- 
nals are assumed to be active-high unless otherwise indicated. Active-low inputs are commonly indi- 
cated by the same inversion bubble used to indicate a falling-edge clock. When a signal is driven to 
its active state, it is said to be asserted. A signal is de-asserted when driven to its inactive state. Set 
and clear inputs explicitly force a flop to a 1 or 0 state, respectively. Such inputs are often used to ini- 
tialize a digital system to a known state when it is first turned on. Otherwise, the flop powers up in a 
random state, which can cause problems for certain logic. Set and clear inputs can be either synchro- 
nous or asynchronous. Synchronous inputs take effect only on the rising clock edge, while asynchro- 
nous inputs take effect immediately upon being asserted. A complementary output is simply an 
inverted copy of the main output. 

A truth table for a flop enhanced with the features just discussed is shown in Table 1.10. The truth 
table assumes a synchronous, active-high clock enable (EN) and synchronous, active-low set and 
clear inputs. The rising edge of the clock is indicated by the T symbol. When the clock is at either 
static value, the outputs of the flop remain in their existing states. When the clock rises, the D, EN, 
CLR , and SET inputs are sampled and acted on accordingly. As a general rule, conflicting infor- 
mation such as asserting CLR and SET at the same time should be avoided, because unknown re- 
sults may arise. The exact behavior in this case depends on the specific flop implementation and may 
vary by manufacturer. 

A basic application of flops is a binary ripple counter. Multiple flops can be cascaded as shown in 
Fig. 1.13 such that each complementary output is fed back to that flop’s input and also used to clock 
the next flop. The current count value is represented by the noninverted flop outputs with the first 
flop representing the LSB. A three-bit counter is shown with an active-low reset input so that the 
counter can be cleared to begin at zero. The counter circuit diagram uses the standard convention of 




Clock f 



Initial Value = D 0 X 


XXX 




Initial Value = Q 0 X 


New Value = D 0 



(b) 



FIGURE 1.11 Rising-edge triggered flop. 




(a) 



Clock \_ 



Initial Value = D 0 X 


XXX 




Initial Value = Q 0 X 


New Value = D 0 



(b) 



FIGURE 1.12 Falling-edge triggered flop. 
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FIGURE 1.13 Three-bit ripple counter. 

TABLE 1.10 Enhanced Flop Truth Table 


Clock 


D 


EN 


CLR 


SET 


Q 


Q 


0 


X 


X 


X 


X 


Qstatic 


Qstatic 


t 


0 


0 


1 


1 


Qstatic 


Qstatic 


t 


0 


1 


1 


1 


0 


i 


t 


1 


1 


1 


1 


i 


0 


t 


X 


X 


0 


1 


0 


i 


t 


X 


X 


1 


0 


I 


0 


t 


X 


X 


0 


0 


? 


? 


1 


X 


X 


X 


X 


Qstatic 


Qstatic 



CLK 



> Q 

D O' 

CLR 



Q 

D Q 
CLR 



> Q 

D O' 
CLR 



showing electrical connectivity between intersecting wires by means of a junction dot. Wires that 
cross without a dot at their intersection are not electrically connected. 

The ripple counter’s operation is illustrated in Fig. 1.14. Each bit starts out at zero if RESET is as- 
serted. Counting begins on the first rising edge of CLK following the de-assertion of RESET. The 
LSB, Q[0], increments from 0 to 1, because its D input is driven by the complementary output, 
which is 1. The complementary output transitions to 0, which does not trigger the Q[l] rising-edge 
flop, but IT does set up the conditions for a trigger after the next CLK rising edge. When CLK rises 
again, Q[0] transitions back to 0, and Q[0] transitions to 1, forming a rising edge to trigger Q[l], 
which loads a 1. This sequence continues until the count value reaches 7, at which point the counter 
rolls over to zero, and the sequence begins again. 

An undesirable characteristic of the ripple counter is that it takes longer for a new count value to 
stabilize as the number of bits in the counter increases. Because each flop’s output clocks the next 
flop in the sequence, it can take some time for all flops to be updated following the CLK rising edge. 
Slow systems may not find this burdensome, but the added ripple delay is unacceptable in most high- 
speed applications. Ways around this problem will be discussed shortly. 
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FIGURE 1 .14 Ripple counter timing diagram. 



A relative of the flop is the D-type latch, which is also capable of retaining its state indefinitely. A 
latch has a D input, a Q output, and an enable (EN) signal. Whereas a flop transfers its input to its 
output only on the active clock edge, a latch continuously transfers D to Q while EN is active. 
Latches are level sensitive, whereas flops are edge sensitive. A latch retains its state while EN is in- 
active. Table 1.11 shows the latch’s truth table. Latches are simpler than flops and are unsuited to 
many applications in which flops are used. Latches would not substitute for flops in the preceding 
ripple counter example because, while the enable input is high, a continuous loop would be formed 
between the complementary output and input. This would result in rapid, uncontrolled oscillation at 
each latch during the time that the enable is held high. 



TABLE 1.11 D-Latch Truth Table 



EN 


D 


Q 


0 


X 


Qo 


1 


0 


0 


1 


1 


1 



Latches are available as discrete logic elements and can also be assembled from simpler logic 
gates. The Boolean equation for a latch requires feeding back the output as follows: 

Q = (EN&D) + (EN&Q) 

When EN is high, D is passed to Q. Q then feeds back to the second AND function, which maintains 
the state when EN is low. Latches are used in designs based on older technology that was conceived 
when the latch's simplicity yielded a cost savings or performance advantage. Most state-full ele- 
ments today are flops unless there is a specific benefit to using a latch. 



1.9 SYNCHRONOUS LOGIC 



It has been shown that clock signals regulate the operation of a state-full digital system by causing 
new values to be loaded into flops on each active clock edge. Synchronous logic is the general term 
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for a collection of logic gates and flops that are controlled by a common clock. The ripple counter is 
not synchronous, even though it is controlled by a clock, because each flop has its own clock, which 
leads to the undesirable ripple output characteristic previously mentioned. A synchronous circuit has 
all of its flops transition at the same time so that they settle at the same time, with a resultant im- 
provement in performance. Another benefit of synchronous logic is easier circuit analysis, because 
all flops change at the same time. 

Designing a synchronous counter requires the addition of logic to calculate the next count value 
based on the current count value. Figure 1.15 shows a high-level block diagram of a synchronous 
counter and is also representative of synchronous logic in general. Synchronous circuits consist of 
state-full elements (flops), with combinatorial logic providing feedback to generate the next state 
based on the current state. Combinatorial logic is the term used to describe logic gates that have no 
state on their own. Inputs flow directly through combinatorial logic to outputs and must be captured 
by flops to preserve their state. 

An example of synchronous logic design can be made of converting the three-bit ripple counter 
into a synchronous equivalent. Counters are a common logic structure, and they can be designed in a 
variety of ways. The Boolean equations for small counters may be directly solved using a truth table 
and K-map. Larger counters may be assembled in regular structures using binary adders that gener- 
ate the next count value by adding 1 to the current value. A three-bit counter is easily handled with a 
truth-table methodology. The basic task is to create a truth table relating each possible current state 
to a next state as shown in Table 1.12. 



TABLE 1.12 Three-Bit Counter Truth Table 



Reset 


Current State 


Next State 


i 


XXX 


000 


0 


000 


001 


0 


001 


010 


0 


010 


Oil 


0 


Oil 
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0 


100 


101 


0 


101 


no 


0 


110 


in 


0 


1 1 1 


000 




FIGURE 1.15 Synchronous counter block diagram. 
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Three Boolean equations are necessary, one for each bit that feeds back to the count state flops. If 
the flop inputs are labeled D[2:0], the outputs are labeled Q[2:0], and an active-high synchronous re- 
set is defined, the following equations can be developed: 



D[0] = Q[0]&RESET 



D [ 1 ] = {(Q[0]&Q[1]) + (Q[0]&Q[1])}&RESET = (Q[0] © Q[11)&RESET 



D[2] = {(Q[2]&Q[1]&Q[0]) + (Q[2]&Q[1]) + (Q[21&Q[0])}&RESET 

Each equation's output is forced to 0 when RESET is asserted. Otherwise, the counter increments on 
each rising clock edge. Synchronous logic design allows any function to be implemented by chang- 
ing the feedback logic. It would not be difficult to change the counter logic to count only odd or even 
numbers, or to count only as high as 5 before rolling over to 0. Unlike the ripple counter, whose 
structure supports a fixed counting sequence, next state logic can be defined arbitrarily according to 
an application’s needs. 



1. 10 SYNCHRONOUS TIMING ANALYSIS 



Logic elements, including flip-flops and gates, are physical devices that have finite response times to 
stimuli. Each of these elements exhibits a certain propagation delay between the time that an input is 
presented and the time that an output is generated. As more gates are chained together to create more 
complex logic functions, the overall propagation delay of signals between the end points increases. 
Flip-flops are triggered by the rising edge of a clock to load their new state, requiring that the input 
to the flip-flop is stable prior to the rising edge. Similarly, a flip-flop’s output stabilizes at a new state 
some time after the rising edge. In between the output of a flip-flop and the input of another flip-flop 
is an arbitrary collection of logic gates, as seen in the preceding synchronous counter circuit. Syn- 
chronous timing analysis is the study of how the various delays in a synchronous circuit combine to 
limit the speed at which that circuit can operate. As might be expected, circuits with lesser delays are 
able to run faster. 

A clock breaks time into discrete intervals that are each the duration of a single clock period. 
From a timing analysis perspective, each clock period is identical to the last, because each rising 
clock edge is a new flop triggering event. Therefore, timing analysis considers a circuit’s delays over 
one clock period, between successive rising (or falling) clock edges. Knowing that a wide range of 
clock frequencies can be applied to a circuit, the question of time arises of how fast the clock can go 
before the circuit stops working reliably. The answer is that the clock must be slow enough to allow 
sufficient time for the output of a flop to stabilize, for the signal to propagate through the combinato- 
rial logic gates, and for the input of the destination flop to stabilize. The clock must also be slow 
enough for the flop to reliably detect each edge. Each flop circuit is characterized by a minimum 
clock pulse width that must be met. Failing to meet this minimum time can result in the flop missing 
clock events. 

Timing analysis revolves around the basic timing parameters of a flop: input setup time (t su ), in- 
put hold time (t H ), and clock-to-out time ( t co ). Setup time specifies the time immediately preceding 
the rising edge of the clock by which the input must be stable. If the input changes too soon before 
the clock edge, the electrical circuitry within the flop will not have enough time to properly recog- 
nize the state of the input. Hold time places a restriction on how soon after the clock edge the input 
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may begin to change. Again, if the input changes too soon after the clock edge, it may not be prop- 
erly detected by the circuitry. Clock-to-out time specifies how soon after the clock edge the output 
will be updated to the state presented at the input. These parameters are very brief in duration and 
are usually measured in nanoseconds. One nanosecond, abbreviated “ns,” is one billionth of a sec- 
ond. In very fast microchips, they may be measured in picoseconds, or one trillionth or a second. 

Consistent terminology is necessary when conducting timing analysis. Timing is expressed in 
units of both clock frequency and time. Clock frequency, or speed, is quantified in units of hertz, 
named after the twentieth century German physicist, Gustav Hertz. One hertz is equivalent to one 
clock cycle per second — one transition from low to high and a second transition from high to low. 
Units of hertz are abbreviated as Hz and are commonly accompanied by prefixes that denote an or- 
der of magnitude. Commonly observed prefixes used to quantify clock frequency and their defini- 
tions are listed in Table 1.13. Unlike quantities of bytes that use binary-based units, clock frequency 
uses decimal-based units. 



TABLE 1.13 Common Clock Frequency Magnitude Prefixes 



Prefix 


Definition 


Order of Magnitude 


Abbreviation 


Usage 


Kilo 


Thousand 


10 3 


K 


kHz 


Mega 


Million 


10 6 


M 


MHz 


Giga 


Billion 


10 9 


G 


GHz 


Tera 


Trillion 


10 12 


T 


THz 



Units of time are used to express a clock’s period as well as basic logic element delays such as 
the aforementioned t su , t H , and t co . As with frequency, standard prefixes are used to indicate the 
order of magnitude of a time specification. However, rather than expressing positive powers of ten, 
the exponents are negative. Table 1.14 lists the common time magnitude prefixes employed in tim- 
ing analysis. 



TABLE 1.14 Common Time Magnitude Prefixes 



Prefix 


Definition 


Order of Magnitude 


Abbreviation 


Usage 


Milli 


One-thousandth 


1(T 3 


m 


ms 


Micro 


One-millionth 


HT 6 


P 


(IS 


Nano 


One-billionth 


ltr 9 


n 


ns 


Pico 


One-trillionth 


1(T 12 


P 


ps 



Aside from basic flop timing characteristics, timing analysis must take into consideration the fi- 
nite propagation delays of logic gates and wires that connect flop outputs to flop inputs. All real 
components have nonzero propagation delays (the time required for an electrical signal to move 
from an input to an output on the same component). Wires have an approximate propagation delay 
of 1 ns for every 6 in of length. Logic gates can have propagation delays ranging from more than 
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10 ns down to the picosecond range, depending on the technology being used. Newly designed logic 
circuits should be analyzed for timing to ensure that the inherent propagation delays of the logic 
gates and interconnect wiring do not cause a flop’s t su and t H specifications to be violated at a given 
clock frequency. 

Basic timing analysis can be illustrated with the example logic circuit shown Fig. 1.16. There are 
two flops connected by two gates. The logic inputs shown unconnected are ignored in this instance, 
because timing analysis operates on a single path at a time. In reality, other paths exist through these 
unconnected inputs, and each path must be individually analyzed. Each gate has a finite propagation 
delay, t PROP , which is assumed to be 5 ns for the sake of discussion. Each flop has t co = 7 ns, t su = 3 
ns, and t H = 1 ns. For simplicity, it is assumed that there is zero delay through the wires that connect 
the gates and flops. 

The timing analysis must cover one clock period by starting with one rising clock edge and end- 
ing with the next rising edge. How fast can the clock run? The first delay encountered is t co of the 
source flop. This is followed by t PRO p of the two logic gates. Finally, t su of the destination flop must 
be met. These parameters may be summed as follows: 

t CLOCK = ho + 2 x tp ROP + t su = 20 ns 

The frequency and period of a clock are inversely related such that F= 1/f. A 20-ns clock period 
corresponds to a 50-MHz clock frequency: 1/(20 x 10 -9 ) = 50 x 10 6 . Running at exactly the calcu- 
lated clock period leaves no room for design margin. Increasing the period by 5 ns reduces the clock 
to 40 MHz and provides headroom to account for propagation delay through the wires. 

Hold time compliance can be verified following setup time analysis. Meeting a flop’s hold time is 
often not a concern, especially in slower circuits as shown above. The 1 ns t H specification is easily 
met, because the destination flop's D-input will not change until t co + 2 x tp ROP = 17 ns after the 
rising clock edge. Actual timing parameters have variance associated with them, and the best-case 
t co and Ipflop would be somewhat smaller numbers. However, there is so much margin in this case 
that t R compliance is not a concern. 

Hold-time problems sometimes arise in fast circuits where t co and tp ROP are very small. When 
there are no logic gates between two flops, t PRO p can be nearly zero. If the minimum t co is nearly 
equal to the maximum h the situation should be carefully investigated to ensure that the destination 
flop’s input remains stable for a sufficient time period after the active clock edge. 



1.11 CLOCK SKEW 



The preceding timing analysis example is simplified for ease of presentation by assuming that the 
source and destination flops in a logic path are driven by the same clock signal. Although a synchro- 
nous circuit uses a common clock for all flops, there are small, nonzero variances in clock timing at 
individual flops. Wiring delay variances are one source of this nonideal behavior. When a clock 
source drives two flops, the two wires that connect to each flop’s clock input are usually not identical 




FIGURE 1.16 Hypothetical logic circuit. 
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in length. This length inequality causes one flop’s clock to arrive slightly before or after the other 
flop’s clock. 

Clock skew is the term used to characterize differences in edge timing between multiple clock in- 
puts. Skew caused by wiring delay variance can be effectively minimized by designing a circuit so 
that clock distribution wires are matched in length. A more troublesome source of clock skew arises 
when there are too many clock loads to be driven by a single source. Multiple clock drivers are nec- 
essary in these situations, with small variations in electrical characteristics between each driver. 
These driver variances result in clock skew across all the flops in a synchronous design. As might be 
expected, clock skew usually reduces the frequency at which a synchronous circuit can operate. 

Clock skew is subtracted from the nominal clock period for setup time analysis purposes, because 
the worst-case scenario shown in Fig. 1.17 must be considered. This scenario uses the same logic 
circuit in Fig. 1.16 but shows two separate clocks with 1 ns of skew between them. The worst timing 
occurs when the destination flop’s clock arrives before that of the source flop, thereby reducing the 
amount of time available for the D-input to stabilize. Instead of the circuit having zero margin with a 
20-ns period, clock skew increases the minimum period to 21 ns. The extra 1 ns compensates for the 
clock skew to restore a minimum source to destination period time of 20 ns. A slower circuit such as 
this one is not very sensitive to clock skew, especially after backing off to 40 MHz for timing margin 
as shown previously. Digital systems that run at relatively low frequencies may not be affected by 
clock skew, because they often have substantial margins built into their timing analyses. As clock 
speeds increase, the margin decreases to the point at which clock skew and interconnect delay be- 
come important limiting factors in system design. 

Hold time compliance can become more difficult in the presence of clock skew. The basic prob- 
lem occurs when clock skew reduces the source flop’s apparent t co from the destination flop’s per- 
spective, causing the destination’s input to change before t H is satisfied. Such problems are more 
prone in high-speed systems, but slower systems are not immune. Figure 1.18 shows a timing dia- 
gram for a circuit with 1 ns of clock skew where two flops are connected by a short wire with nearly 
zero propagation delay. The flops have t co = 2 ns and t H = 1.5 ns. A scenario like this may be expe- 
rienced when connecting two chips that are next to each other on a circuit board. In the absence of 
clock skew, the destination flop’s input would change t co after the rising clock edge, exceeding t H by 
0.5 ns. The worst-case clock skew causes the source flop clock to arrive before that of the destination 
flop, resulting in an input change just 1 ns after the rising clock edge and violating t H . 

Solutions to skew-induced t H violations include reducing the skew or increasing the delay be- 
tween source and destination. Unfortunately, increasing a signal's propagation delay may cause t su 
violations in high-speed systems. 
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FIGURE 1.17 Clock skew influence on setup time analysis. 
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FIGURE 1.18 Hold-time violation caused by clock skew. 

Hold time may not be a problem in slower circuits, because slower circuits often have paths be- 
tween flops with sufficiently long propagation delays to offset clock skew problems. However, 
even slow circuits can experience hold-time problems if flops are connected with wires or compo- 
nents that have small propagation delays. It is also important to remember that hold-time compli- 
ance is not a function of clock period but of clock skew, t co , and t H . Therefore, a slow system that 
uses fast components may have problems if the clock skew exceeds the difference between t co 
and t H . 

1.12 CLOCK JITTER 

An ideal clock signal has a fixed frequency and duty cycle, resulting in its edges occurring at the ex- 
act time each cycle. Real clock signals exhibit slight variations in the timing of successive edges. 
This variation is known as jitter and is illustrated in Fig. 1.19. Jitter is caused by nonideal behavior 
of clock generator circuitry and results in some cycles being longer than nominal and some being 
shorter. The average clock frequency remains constant, but the cycle-to-cycle variance may cause 
timing problems. 

Just as clock skew worsens the analysis for both t su and t H , so does jitter. Jitter must be sub- 
tracted from calculated timing margins to determine a circuit’s actual operating margin. Some sys- 
tems are more sensitive to jitter than others. As operating frequencies increase, jitter becomes 
more of a problem, because it becomes a greater percentage of the clock period and flop timing 
specifications. Jitter specifications vary substantially. Many systems can tolerate 0.5 ns of jitter 
and more. Very sensitive systems may require high-quality clock circuitry that can reduce jitter to 
below 100 ps. 
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FIGURE 1.19 Clock jitter. 
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1. 13 DERIVED LOGICAL BUILDING BLOCKS 



Basic logic gates and flops can be combined to form more complex structures that are treated as 
building blocks when designing larger digital systems. There are various common functions that an 
engineer does not want to redesign from scratch each time. Some of the common building blocks are 
multiplexers, demultiplexers, tri-state buffers, registers, and shift registers. Counters represent an- 
other building block alluded to in the previous discussion of synchronous logic. A counter is a com- 
bination of flops and gates that can count either up or down, depending on the implementation. 

Multiplexers, sometimes called selectors, are combinatorial elements that function as a multiposi- 
tion logical switches to select one of many inputs. Figure 1.20 shows a common schematic represen- 
tation of a multiplexer, often shortened to mux. A mux has an arbitrary number of data inputs, often 
an even power of two, and a smaller number of selector inputs. According to the binary state of the 
selector inputs, a specific data input is transferred to the output. 

Muxes are useful, because logic circuits often need to choose between multiple data values. A 
counter, for example, may choose between loading a next count value or loading an arbitrary value 
from external logic. A possible truth table for a 4-to-l mux is shown in Table 1.15. Each selector in- 
put value maps to one, and only one, data input. 



TABLE 1.15 Four-to-One Multiplexer 
Truth Table 



SI 


so 


Y 


0 


0 


A 


0 


1 


B 


1 


0 


C 


1 


1 


D 



A demultiplexer, also called a demux, performs the inverse operation of a mux by transferring a 
single input to the output that is selected by select inputs. A demux is drawn similarly to a mux, as 
shown in Fig. 1.21. 





FIGURE 1.20 Four-to-one multiplexer. 



FIGURE 1.21 One-to-four demultiplexer. 
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A possible truth table for a l-to-4 demux is shown in Table 1.16. Those outputs that are not se- 
lected are held low. The output that is selected assumes the state of the data input. 



TABLE 1.16 One-to-Four Demultiplexer 
Truth Table 



SI 


so 


A 


B 


C 


D 


0 


0 


Din 


0 


0 


0 


0 


1 


0 


Din 


0 


0 


1 


0 


0 


0 


Din 


0 


1 


1 


0 


0 


0 


Din 



A popular use for a demux is as a decoder. The main purpose of a decoder is not so much to trans- 
fer an input to one of several outputs but simply to assert one output while not asserting those that 
are not selected. This function has great utility in microprocessor address decoding, which involves 
selecting one of multiple devices (e.g., a memory chip) at a time for access. The truth table for a 2- 
to-4 decoder is shown in Table 1.17. The decoder’s outputs are active-low, because most memory 
and microprocessor peripheral chips use active-low enable signals. 



TABLE 1.17 Two-to-Four Decoder Truth Table 



SI 


SO 


A 


B 


c 


D 


0 


0 


0 


1 


1 


1 


0 


1 


1 


0 


1 


1 


1 


0 


1 


1 


0 


1 


1 


1 


1 


1 


1 


0 




FIGURE 1.22 Tri-state buffer. 



Tri-state buffers are combinatorial elements that can drive three out- 
put states rather than the standard 0 and 1 states. The third state is 
off, often referred to as high-impedance, hi-Z , or just Z. Tri-state 
buffers enable multiple devices to share a common output wire by 
cooperatively agreeing to have only one device drive the wire at any 
one time, during which all other devices remain in hi-Z. A tri-state 
buffer is drawn as shown in Fig. 1.22. 

A tri-state buffer passes its D-input to Y-output when enabled. 
Otherwise, the output will be turned off as shown in Table 1.18. 

Electrically, tri-state behavior allows multiple tri-state buffers to be connected to the same wire 
without contention. Contention normally results when multiple outputs are connected together be- 
cause some want to drive high and some low. This creates potentially damaging electrical contention 
(a short circuit). However, if multiple tri-state buffers are connected, and only one at a time is en- 
abled, there is no possibility of contention. The main advantage here is that digital buses in comput- 
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TABLE 1.18 


Tri-state Buffer Truth Table 


EN 


D 


Y 


0 


X 


Z 


1 


0 


0 


1 


i 


1 



ers can be arbitrarily expanded by adding more devices without the need to add a full set of input or 
output signals each time a new device is added. In a logical context, a bus is a collection of wires that 
serve a common purpose. For example, a computer’s data bus might be eight wires that travel to- 
gether and collectively represent a byte of data. Electrical contention on a bus is often called a bus- 
fight. Schematically, multiple tri-state buffers might be drawn as shown in Fig. 1.23. 

Each tri-state buffer contains its own enable signal, which is usually driven by some type of de- 
coder. The decoder guarantees that only one tri-state buffer is active at any one time, preventing con- 
tention on the common wire. 

Registers are collections of multiple flops arranged in a group with a common function. They are 
a common synchronous-logic building block and are commonly found in multiples of 8-bit widths, 
thereby representing a byte, which is the most common unit of information exchange in digital sys- 
tems. An 8-bit register provides a common clock and clock enable for all eight internal flops. The 
clock enable allows external control of when the flops get reloaded with new D-input values and 
when they retain their current values. It is common to find registers that have a built-in tri-state 
buffer, allowing them to be placed directly onto a shared bus without the need for an additional tri- 
state buffer component. 

Whereas normal registers simply store values, synchronous elements called shift registers manip- 
ulate groups of bits. Shift registers exist in all permutations of serial and parallel inputs and outputs. 
The role of a shift register is to somehow change the sequence of bits in an array of bits. This in- 
cludes creating arrays of bits from a single bit at a time (serial input) or distributing an array of bits 
one bit at a time (serial output). A serial-in, parallel-out shift register can be implemented by chain- 
ing several flops together as shown in Fig. 1.24. 



FIGURE 1 .23 Multiple tri-state buffers on a sin- 
gle wire. 



Dout[0] 




FIGURE 1 .24 Serial-in, parallel-out shift register. 
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On each rising clock edge, a new serial input bit is clocked into the first flop, and each flop in suc- 
cession loads its new value based on its predecessor’s value. At any given time, the parallel output of 
an N-bit shift register reflects the state of the last N bits shifted in up to that time. In this example 
(N = 4), a serial stream of bits collected in four clock cycles can be operated upon as a unit of four 
bits once every fourth cycle. As shown, data is shifted in MSB first, because Dout[3] is shown in the 
last bit position. Such a simple transformation is useful, because it is often more practical to commu- 
nicate digital data in serial form where only one bit of information is sent per clock cycle, but im- 
practical to operate on that data serially. An advantage of serial communication is that fewer wires 
are required as compared to parallel. Yet, parallel representation is important because arithmetic 
logic can get overly cumbersome if it has to keep track of one bit at a time. A parallel-in, serial-out 
shift register is very similar, as shown in Fig. 1.25, with the signals connected for MSB first opera- 
tion to match the previous example. 

Four flops are used here as well. However, instead of taking in one bit at a time, all flops are 
loaded when the load signal is asserted. The 2-to-l muxes are controlled by the load signal and de- 
termine if the flops are loaded with new parallel data or shifted serial data. Over each of the next four 
clock cycles, the individual bits are shifted out one at a time. If these two shift register circuits were 
connected together, a crude serial data communications link could be created whereby parallel data 
is converted to serial and then back to parallel at each end. 




FIGURE 1 .25 Parallel-in, serial-out shift register. 
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CHAPTER 2 

Integrated Circuits and the 
7400 Logic Families 



Once basic logic design theory is understood, the next step is transferring that knowledge to a practi- 
cal context that includes real components. This chapter explains what an integrated circuit is and 
how off-the-shelf components can be used to implement arbitrary logic functions. 

Integrated circuits, called chips by engineers and laymen alike, are what enable digital systems as 
we know them. The chapter begins with an introduction to how chips are constructed. Familiarity 
with basic chip fabrication techniques and terminology enables an engineer to comprehend the dis- 
tinctions between various products so that their capabilities can be more readily evaluated. 

A survey of packaging technology follows to provide familiarity with the common physical char- 
acteristics of commercially available chips. Selecting a package that is appropriate for a particular 
design can be as critical as selecting the functional parameters of the chip itself. It is important to un- 
derstand the variety of available chip packages and why different types of packages are used for dif- 
ferent applications. 

The chapter’s major topic follows next: the 7400 logic families. These off-the-shelf logic chips 
have formed the basis of digital systems for decades and continue to do so, although in fewer num- 
bers as a result of the advent of denser components. 7400 family features are presented along with 
complete examples of how the chips are applied in real designs. The purpose of this discussion is to 
impart a practical and immediately applicable understanding of how digital system design can be ex- 
ecuted with readily available components. Although these devices are not appropriate for every ap- 
plication, many basic problems can be solved with 7400 chips once it is understood how to employ 
them. 

Having seen how real chips can be used to solve actual design problems, a closely related topic is 
presented at the end of this chapter: the interpretation of data sheets. Manufacturers’ data sheets con- 
tain critical information that must be understood to ensure a working design. An understanding of 
how data sheets are organized and the types of information that they contain is a necessary knowl- 
edge base for every engineer. 



2. 1 THE INTEGRATED CIRCUIT 



Digital logic and electronic circuits derive their functionality from electronic switches called transis- 
tors. Roughly speaking, the transistor can be likened to an electronically controlled valve whereby 
energy applied to one connection of the valve enables energy to flow between two other connections. 
By combining multiple transistors, digital logic building blocks such as AND gates and flip-flops are 
formed. Transistors, in turn, are made from semiconductors. Consult a periodic table of elements in 
a college chemistry textbook, and you will locate semiconductors as a group of elements separating 
the metals and nonmetals. They are called semiconductors because of their ability to behave as both 
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metals and nonmetals. A semiconductor can be made to conduct electricity like a metal or to insulate 
as a nonmetal does. These differing electrical properties can be accurately controlled by mixing the 
semiconductor with small amounts of other elements. This mixing is called doping. A semiconduc- 
tor can be doped to contain more electrons (N-type) or fewer electrons (P-type). Examples of com- 
monly used semiconductors are silicon and germanium. Phosphorous and boron are two elements 
that are used to dope N-type and P-type silicon, respectively. 

A transistor is constructed by creating a sandwich of differently doped semiconductor layers. The 
two most common types of transistors, the bipolar-junction transistor (BJT) and the field-effect tran- 
sistor (FET) are schematically illustrated in Fig. 2.1. This figure shows both the silicon structures of 
these elements and their graphical symbolic representation as would be seen in a circuit diagram. 
The BJT shown is an NPN transistor, because it is composed of a sandwich of N-P-N doped silicon. 
When a small current is injected into the base terminal, a larger current is enabled to flow from the 
collector to the emitter. The FET shown is an N-channel FET; it is composed of two N-type regions 
separated by a P-type substrate. When a voltage is applied to the insulated gate terminal, a current is 
enabled to flow from the drain to the source. It is called N-channel. because the gate voltage induces 
an N-channel within the substrate, enabling current to flow between the N-regions. 

Another basic semiconductor structure shown in Fig. 2.1 is a diode, which is formed simply by a 
junction of N-type and P-type silicon. Diodes act like one-way valves by conducting current only 
from P to N. Special diodes can be created that emit light when a voltage is applied. Appropriately 
enough, these components are called light emitting diodes, or LEDs. These small lights are manufac- 
tured by the millions and are found in diverse applications from telephones to traffic lights. 

The resulting small chip of semiconductor material on which a transistor or diode is fabricated can 
be encased in a small plastic package for protection against damage and contamination from the out- 
side world. Small wires are connected within this package between the semiconductor sandwich and 
pins that protrude from the package to make electrical contact with other parts of the intended circuit. 
Once you have several discrete transistors, digital logic can be built by directly wiring these compo- 
nents together. The circuit will function, but any substantial amount of digital logic will be very 
bulky, because several transistors are required to implement each of the various types of logic gates. 

At the time of the invention of the transistor in 1947 by John Bardeen, Walter Brattain, and Will- 
iam Shockley, the only way to assemble multiple transistors into a single circuit was to buy separate 
discrete transistors and wire them together. In 1959, Jack Kilby and Robert Noyce independently in- 
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FIGURE 2.1 BJT, FET, and diode structural and symbolic representations. 
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vented a means of fabricating multiple transistors on a single slab of semiconductor material. Their 
invention would come to be known as the integrated circuit , or IC, which is the foundation of our 
modem computerized world. An IC is so called because it integrates multiple transistors and diodes 
onto the same small semiconductor chip. Instead of having to solder individual wires between dis- 
crete components, an IC contains many small components that are already wired together in the de- 
sired topology to form a circuit. 

A typical IC, without its plastic or ceramic package, is a square or rectangular silicon die measur- 
ing from 2 to 15 mm on an edge. Depending on the level of technology used to manufacture the IC, 
there may be anywhere from a dozen to tens of millions of individual transistors on this small chip. 
This amazing density of electronic components indicates that the transistors and the wires that con- 
nect them are extremely small in size. Dimensions on an IC are measured in units of micrometers, 
with one micrometer (1 pm) being one millionth of a meter. To serve as a reference point, a human 
hair is roughly 100 pm in diameter. Some modem ICs contain components and wires that are mea- 
sured in increments as small as 0.1 pm! Each year, researchers and engineers have been finding new 
ways to steadily reduce these feature sizes to pack more transistors into the same silicon area, as in- 
dicated in Fig. 2.2. 

Many individual chemical process steps are involved in fabricating an IC. The process begins 
with a thin, clean, polished semiconductor wafer — most often silicon — that is usually one of three 
standard diameters: 100, 200, or 300 mm. The circular wafer is cut from a cylindrical ingot of solid 
silicon that has a perfect crystal structure. This perfect crystal base structure is necessary to promote 
the formation of other crystals that will be deposited by subsequent processing steps. Many dice are 
arranged on the wafer in a grid as shown in Fig. 2.3. Each die is an identical copy of a master pattern 
and will eventually be sliced from the wafer and packaged as an IC. An IC designer determines how 
different portions of the silicon wafer should be modihed to create transistors, diodes, resistors, ca- 
pacitors, and wires. This IC design layout can then be used to, in effect, draw tiny components onto 
the surface of the silicon. Sequential drawing steps are able to build sandwiches of differently doped 
silicon and metal layers. 

Engineers realized that light provided the best way to faithfully replicate patterns from a template 
onto a silicon substrate, similar to what photographers have been doing for years. A photographer 
takes a picture by briefly exposing film with the desired image and then developing this film into a 
negative. Once this negative has been created, many identical photographs can be reproduced by 
briefly exposing the light-sensitive photographic paper to light that is focused through the negative. 
Portions of the negative that are dark do not allow light to pass, and these corresponding regions of 
the paper are not exposed. Those areas of the negative that are light allow the paper to be exposed. 




FIGURE 2.2 Decreasing IC feature size over time. (Future data for years 2003 through 2005 compiled from The 
International Technology Roadmap for Semiconductors, Semiconductor Industry Association, 2001.) 
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When the paper is developed in a chemical bath, portions of the paper that were exposed change 
color and yield a visible image. 

Photographic processes provide excellent resolution of detail. Engineers apply this same principle 
in fabricating ICs to create details that are fractions of a micron in size. Similar to a photographic 
negative, a mask is created for each IC processing step. Like a photographic negative, the mask does 
not have to be the same size as the silicon area it is to expose because, with lenses, light can be fo- 
cused through the mask to an arbitrary area. Using a technique called photolithography, the silicon 
surface is first prepared with a light-sensitive chemical called photoresist. The prepared surface is 
then exposed to light through the mask. Depending on whether a positive or negative photoresist 
process is employed, the areas of photoresist that have been either exposed or not exposed to light 
are washed away in a chemical bath, resulting in a pattern of bare and covered areas of silicon. The 
wafer can then be exposed to chemical baths, high temperature metal vapors, and ion beams. Only 
the bare areas that have had photoresist washed away are affected in this step. In this way, specific 
areas of the silicon wafer can be doped according to the IC designers' specifications. Successive 
mask layers and process steps can continue to wash away and expose new layers of photoresist and 
then build sandwiches of semiconductor and metal material. A very simplified view of these process 
steps is shown in Fig. 2.4. The semiconductor fabrication process must be performed in a clean- 
room environment to prevent minute dust particles and other contaminants from disturbing the li- 
thography and chemical processing steps. 

In reality, dozens of such steps are necessary to fabricate an IC. The semiconductor structures that 
must be formed by layering different metals and dopants are complex and must be formed one thin 
layer at a time. Modern ICs typically have more than four layers of metal, each layer separated from 
others by a thin insulating layer of silicon dioxide. The use of more metal layers increases the cost of 
an IC, but it also increases its density, because more metal wires can be fabricated to connect more 
transistors. This complete process from start to finish usually takes one to four weeks. The chemical 
diffusion step (5) is an example of how different regions of the silicon wafer are doped to achieve 
varying electrical characteristics. In reality, several successive doping steps are required to create 
transistors. The metal deposition step (10) is an example of how the microscopic metal wires that 
connect the many individual transistors are created. Hot metal vapors are passed over the prepared 
surface of the wafer. Over time, individual molecules adhere to the exposed areas and form continu- 
ous wires. Historically, most metal interconnects on silicon ICs are made from aluminum. However, 
copper has become a common component of leading-edge ICs. 

As IC feature sizes continue to shrink, the physical properties of light can become limiting factors 
in the resolution with which a wafer can be processed. Shorter light wavelengths are necessary to 
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FIGURE 2.4 The IC fabrication process. 
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meet the demands of leading-edge IC process technology. The human eye can detect electromag- 
netic energy from about 700 nm (red) to 400 nm (violet). Whereas ultraviolet light (< 400 nm) was 
once adequate for IC fabrication, deep UV wavelengths are now in use, and shorter wavelengths be- 
low 200 nm are being explored. 

Each of the process steps is applied to the entire wafer. The many dice on a single wafer are usu- 
ally exposed to light through the same mask. The mask is either large enough to cover the entire wa- 
fer and therefore expose all dice at once, or the mask is stepped through the dice grid (using a 
machine appropriately called a stepper) such that each die location is exposed separately before the 
next processing step. In certain cases, such as small-volume or experimental runs, different die loca- 
tions on the same wafer will be exposed with different masks. This is entirely feasible but may not 
be as efficient as creating a wafer on which all dice are identical. 

When an IC is designed and fabricated, it generally follows one of two main transistor technolo- 
gies: bipolar or metal-oxide semiconductor (MOS). Bipolar processes create BJTs, whereas MOS 
processes create FETs. Bipolar logic was more common before the 1980s, but MOS technologies 
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have since accounted the great majority of digital logic ICs. N-channel FETs are fabricated in an 
NMOS process, and P-channel FETs are fabricated in a PMOS process. In the 1980s, complemen- 
tary-MOS, or CMOS, became the dominant process technology and remains so to this day. CMOS 
ICs incorporate both NMOS and PMOS transistors. 



2.2 1C PACKAGING 



When the wafer has completed its final process step, it is tested and then sliced up to separate the in- 
dividual dice. Dice that fail the initial testing are quickly discarded. Those that pass inspection are 
readied for packaging. A package is necessary for several reasons, including protection of the die and 
the creation of electromechanical connections with other circuitry. ICs are almost always mounted 
onto a circuit board, and it is usually difficult to mount unpackaged ICs directly to the board. How- 
ever, there are special situations in which ICs are not packaged and are directly attached to the board. 
These cases are often at opposite ends of the technological spectrum. At the low end of technology, 
ICs can be several process generations behind the current state of the art. Therefore, the relative com- 
plexity of mounting them to a circuit board may not be as great. The savings of direct mounting are in 
space and cost. A common quartz wristwatch benefits from direct mounting, because the small con- 
fines of a watch match very well with the space savings achieved by not requiring a package for the 
IC. These watch ICs use mature semiconductor process technologies. At the high end of technology, 
some favorable electrical and thermal characteristics can be achieved by eliminating as much inter- 
mediate bulk as possible between individual ICs and supporting circuitry. However, the technical dif- 
ficulties of direct-mounting a leading-edge IC can be challenging and greatly increase costs. 
Therefore, direct-mounting of all but very low-end electronics is relatively rare. 

IC packaging technology has evolved dramatically from the early days, yet many mature package 
types still exist and are in widespread use. Plastic and ceramic are the two most common materials 
used in an IC package. They surround the die and its lead frame. The lead frame is a structure of metal 
wires that fan out from the die and extend to the package exterior as pins for connection to a circuit 
board. Plastic packages are generally lower in cost as compared to ceramics, but they have poorer 
thermal performance. Thermal characteristics are important for ICs that handle large currents and dis- 
sipate large quantities of heat. To prevent the IC from overheating, the heat must be conducted and ra- 
diated away as efficiently as possible. Ceramic material conducts heat far better than plastic. 

A very common package is the dual in-line package, or DIP, shown in Fig. 2.5. A DIP has two 
parallel rows of pins that are spaced on 0.1-in centers. Each pin extends roughly 0.2 in below the 
bottom of the plastic or ceramic body. Pins are numbered sequentially from 1 going left to right 
along one side and resuming on the opposite side from right to left. There is usually at least one pin 
1 marker at one end of the package. It is either a dot near pin 1 or a semicircular indentation on one 
edge of the package. 

DIPs are commonly manufactured in standard sizes ranging from 6 to 48 pins, and some manu- 
facturers go beyond 48 pins. Smaller pin-count devices have 0.3-in wide packages, and larger de- 
vices are 0.6 in wide. Because of the ubiquity of the DIP, there are many variations of pin counts and 
package widths. For many years, the DIP accounted for the vast majority of digital logic packages. 
Common logic ICs were manufactured in 14- and 16-pin DIPs. Memory ICs were manufactured in 
16-, 18-. 24-, and 28-pin DIPs. Microprocessors were available in 40-, 44-, and 48-pin DIPs. DIPs 
are still widely available today, but their use as a percentage of the total IC market has declined 
markedly. However, the benefits of the DIP remain: they are inexpensive and easy to work with by 
hand, eliminating the need for costly assembly tools. 

If you were to carefully crack open a DIP, you would be able to see the mechanical assembly of 
the die and lead frame. This is illustrated in Fig. 2.6. The die is cemented in the center of a stamped 
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end view 




FIGURE 2.5 A 16-pin dual in-line package. 



metal frame and is connected to the individual pins with extremely thin wires. Once the electrical 
connections are made, the fragile assembly is encased in a plastic or ceramic body for protection and 
the exterior portions of the pins are folded vertically. 

All other IC packages are variations on this theme. Some packages use a similar lead-frame struc- 
ture, whereas more advanced packages utilize very high-quality miniature circuit boards made from 
either ceramic or fiberglass. 

An oft-quoted attribute of ICs is that their density doubles every 18 months as a result of improve- 
ments in process technology. This prediction was made in 1965 by Dr. Gordon Moore, a co-founder 
of Intel. It has since come to be known as Moore 's law, because the semiconductor industry has 
matched this prediction over time. Before to the explosion of IC density, the semiconductor industry 
classified ICs into several categories depending on the number of logic gates on the device: small- 
scale integration (SSI), medium-scale integration (MSI), large-scale integration (LSI), and, finally, 
very large-scale integration (VLSI). Figure 2.7 provides a rough definition of these terms. As the 
density of ICs continued to grow at a rapid pace, it became rather ridiculous to keep adding words 
like “very” and “extra” to these categories, and the terms’ widespread use declined. ICs are now of- 
ten categorized based on their minimum feature size and metal process. For example, one might re- 
fer to an IC as “0.25 pm, three-layer metal (aluminum)” or “0.13 pm, six-layer copper.” 




FIGURE 2.6 DIP lead frame. 
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FIGURE 2.7 Relative component count of ICs. 

As IC densities grew at this tremendous pace, the number of pins on each IC and the speed at 
which they operated began to increase as well. DIPs soon become a limiting factor in the perfor- 
mance of ICs. First, the addition of more pins made the package longer, because there are only two 
rows of pins. However, most chips are relatively square in shape to minimize on-chip interconnec- 
tion distances. This creates a conflict: a long, narrow package that is unsuitable for increasing square 
die sizes. Second, the lengths of some pins in the DIP lead frame, especially those near the corners, 
are relatively long. This has an adverse impact on the quality of high-speed signals. Third, the 0. 1-in 
pin spacing on DIPs keeps them artificially large as circuit board technologies continue improving to 
handle smaller contacts. 

One solution to the pin density problem was the development of the pin grid array , or PGA, pack- 
age. Shown in Fig. 2.8, the PGA is akin to a two-dimensional DIP with pins spaced on 0.1-in cen- 
ters. Very high pin counts are achievable with a PGA, because all of its area is usable rather than just 
the perimeter. Being a square, the PGA is compatible with large ICs, because it more closely 
matches the proportions of a silicon chip. 

The PGA provides high pin density, but its drawback is relatively high cost. Two lower-cost pack- 
ages were developed for ICs that require more pins than DIPs but fewer pins than found on a PGA: 
the small outline integrated circuit (SOIC) and the plastic leaded chip carrier (PLCC). Examples of 
SOIC and PLCC packages are shown in Fig. 2.9. Both SOICs and PLCCs feature pins on a 0.05-in 
pitch — half that of a DIP or PGA. The SOIC is basically a shrunken DIP with shorter pins that are 
folded parallel to the plane of the package instead of protruding down vertically. This enables the 
SOIC to be surface mounted onto the circuit board by soldering the pins directly to metal pads on the 
board. By contrast, a DIP requires that holes be drilled in the board for the pins to be soldered into. 
The SOIC represents an improvement in packaging density and ease of manufacture over DIPs, but 
it is still limited to relatively simple ICs due to its one-dimensional pin arrangement. 

PLCCs increase pin density and ease the design of the lead frame by utilizing a two-dimensional 
pin arrangement. Higher pin counts (68, 84, and 96 pins) were enabled by the PLCC, and its square 




FIGURE 2.8 Pin grid array package. 
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FIGURE 2.9 SOIC and PLCC. 



side view 




top view 




design is more capable of accepting larger silicon dice than either the DIP or SOIC. PLCC leads are 
not bent outward, as in the case of a SOIC, but are curved inward in a “J” pattern. The more similar 
aspect ratio of the PLCC package and the dice that are placed into them enabled lead frames with 
shorter and more consistent pin lengths, reducing the degrading effects on high-speed signals. 

A higher-density relative of the PLCC and SOIC is the quad flat pack, or QFP. A QFP resembles 
a PLCC in terms of its square or rectangular shape but has leads that are bent outward like an SOIC. 
Additionally, QFP leads are thinner and spaced at a smaller pitch to achieve more than twice the lead 
density of a comparably sized PLCC. 

Perhaps the most widely used package for high-density ICs is the ball grid array, or BGA. The 
BGA is a surface mount analog to the PGA with significantly higher ball density. Contact is made 
between a BGA and a circuit board by means of many small preformed solder balls that adhere to 
contacts on the bottom surface of the BGA package. Figure 2.10 illustrates the general BGA form 
factor, but numerous variants on aspect ratio and ball pitch exist. Typical ball pitch ranges from 
1.27 mm down to 0.8 mm, and higher densities are on the way. 

There are many variations of the packaging technologies already mentioned. Most packages com- 
ply with industry standard dimensions, but others are proprietary. Semiconductor manufacturers pro- 
vide detailed drawings of their packages to enable the proper design of circuit boards for their 
products. 



2.3 THE 7400-SERIES DISCRETE LOGIC FAMILY 



With the advent of ICs in the early 1960s, engineers needed ready access to a library of basic logic 
gates so that these gates could be wired together on circuit boards and turned into useful products. 
Rather than having to design a custom microchip for each new project, semiconductor companies 







FIGURE 2.10 Ball grid array. 
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began to recognize a market for standard, off-the-shelf logic ICs. In 1963 and 1964, Sylvania and 
Texas Instruments began shipment of the 7400-series discrete logic family and unknowingly started 
a de facto industry standard that lasts to this day and shows no signs of disappearing anytime soon. 
Using the 7400 family, an engineer can select logic gates, flip-flops, counters, and buffers in individ- 
ual packages and wire them together as desired to solve a specific problem. Some of the most com- 
mon members of the 7400 family are listed in Table 2. 1 . 



TABLE 2.1 Common 7400 ICs 



Part Number 


Function 


Number of Pins 


7400 


Quad two-input NAND gates 


14 


7402 


Quad two-input NOR gates 


14 


7404 


Hex inverters 


14 


7408 


Quad two-input AND gates 


14 


7432 


Quad two-input OR gates 


14 


7447 


BCD to seven-segment display decoder/driver 


16 


7474 


Dual D-type positive edge triggered flip-flops 


14 


7490 


Four-bit decade counter 


14 


74138 


Three-to-eight decoder 


16 


74153 


Dual 4-to- 1 multiplexer 


16 


74157 


Quad 2-to-l multiplexers 


16 


74160 


Four-bit binary synchronous counter 


16 


74164 


Eight-bit parallel out serial shift registers 


16 


74174 


Quad D-type flip-flops with complementary outputs 


16 


74193 


Four-bit synchronous up/down binary counter 


16 


74245 


Octal bus transceivers with tri-state outputs 


20 


74373 


Octal D-type transparent latch 


20 


74374 


Octal D-type flip-flops 


20 



These are just a few of the full set of 7400 family members. Many 7400 parts are no longer used, 
because their specific function is rarely required as a separate chip in modern digital electronics de- 
signs. However, the parts listed above, and many others that are not listed, are still readily available 
today and are commonly found in a broad range of digital designs ranging from low-end to high- 
tech devices. 7400-series logic has been available in DIPs for a long time, as well as (more recently) 
SOICs and other high-density surface mount packages. All flavors of basic logic gates are available 
with varying numbers of inputs. For example, there are 2-, 3-, and 4-input AND gates and 2-, 3-, 4-, 
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8-, 12-, and 13-input NAND gates. There are numerous varieties of flip-flops, counters, multiplexers, 
shift registers, and bus transceivers. Flip-flops exist with and without complementary outputs, pre- 
set/clear inputs, and independent clocks. Counters are available in 4-bit blocks that can both incre- 
ment and decrement and count to either 15 (binary counter) or 9 (decade counter) before restarting 
the count at 0. Shift registers exist in all permutations of serial and parallel inputs and outputs. Bus 
transceivers in 4- and 8-bit increments exist with different types of output enables and capabilities to 
function in unidirectional or bidirectional modes. Bus transceivers enable the creation and expansion 
of tri-state buses on which multiple devices can communicate. 

One interesting IC is the 7447 seven-segment display driver. This component allows the creation 
of graphical numeric displays in applications such as counters and timers. Seven-segment displays 
are commonly seen in automobiles, microwave ovens, watches, and consumer electronics. Seven in- 
dependent on/off elements can represent all ten digits as shown in Fig. 2.1 1. The 7447 is able to 
drive an LED-based seven-segment display when given a binary coded decimal (BCD) input. BCD 
is a four-bit binary number that has valid values from 0 through 9. Hexadecimal values from OxA 
through OxF are not considered legal BCD values. 

Familiarity with the 7400 series proves very useful no matter what type of digital system you are 
designing. For low-end systems, 7400-series logic may be the only type of IC at your disposal to 
solve a wide range of problems. At the high end, many people are often surprised to see a small 14- 
pin 7400-series IC soldered to a circuit board alongside a fancy 32-bit microprocessor running at 
100 MHz. The fact is that the basic logic functions that the 7400 series offers are staples that have di- 
rect applications at all levels of digital systems design. It is time well spent to become familiar with 
the extensive capabilities of the simple yet powerful 7400 family. Manufacturers’ logic data books, 
either in print or on line, are invaluable references. It can be difficult to know ahead of time if a de- 
sign may call for one more gate to function properly; that is when a 40-year old logic family can 
save the day. 



2.4 APPLYING THE 7400 FAMILY TO LOGIC DESIGN 



Applications of the 7400 family are truly infinite, because the various ICs represent basic building 
blocks rather than complete solutions. Up through the early 1980s, it was common to see computer 
systems constructed mainly from interconnected 7400-series ICs along with a few LSI components 
such as a microprocessor and a few memory chips. These days, most commercial digital systems are 
designed using some form of higher-density logic IC, either fully custom or user programmable. 
However, the engineer or hobbyist who has a relatively small-scale logic problem to solve, and who 
may not have access to more expensive custom or programmable logic ICs, may be able to utilize 
only 7400 logic in an efficient and cost-effective solution. Two examples follow to provide insight 
into how 7400 building blocks can be assembled to solve logic design problems. 

A hypothetical example is a logic circuit to examine three switches and turn on an LED if two and 
only two of the three switches are turned on. The truth table for such a circuit is as follows in 




FIGURE 2.1 1 Seven-segment display. 
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Table 2.2, given that A, B, and C are the inputs, and an LED is the active-low output (assume that the 
LED is turned on by driving a logic 0 rather than a logic 1). 



TABLE 2.2 LED Driver Logic Truth Table 



A 


B 


c 


LED 


0 


0 


0 


1 


0 


0 


1 


1 


0 


1 


0 


1 


0 


1 


1 


0 


1 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


0 


1 


1 


1 


1 



This LED driver truth table can be converted into the following Boolean logic equation with a Kar- 
naugh map or simply by inspection: 



LED = ABC + ABC + ABC 

After consulting a list of available 7400 logic ICs, three become attractive for our application: the 
7404 inverter. 7408 AND, and 7432 OR. The LED driver logic equation requires four inverters, six 
two-input AND gates, and two 2-input OR gates. Four ICs are required, because a 7404 provides six 
inverters, a 7408 provides four AND gates, and a 7432 contains four OR gates. These four ICs can 
be connected according to a schematic diagram as shown in Fig. 2. 12. A schematic diagram illus- 
trates the electrical connectivity scheme of various components. Each component is identified by a 
reference designator consisting of a letter followed by a number. ICs are commonly identified by 
reference designators beginning with the letter “U”. Additionally, each component has numerous 
pins that are numbered on the diagram. These pin numbers conform to the IC manufacturer’s num- 
bering scheme. Each of these 7400-series ICs has 14 pins. Another convention that remains from bi- 
polar logic days is the use of the label VCC to indicate the positive voltage supply node. GND 
represents ground — the common, or return, voltage supply node. 

All ICs require connections to a power source. In this circuit, +5 V serves as the power supply, be- 
cause the 7400 family is commonly manufactured in a bipolar semiconductor process requiring a 
+5-V supply. The four rectangular blocks at the top of the diagram represent this power connection 
information. Because this schematic diagram shows individual gates, the gates’ reference designa- 
tors contain an alphabetic suffix to identify unique instances of gates within the same IC. Not all 
gates in each IC are actually used. Those that are unused are tied inactive by connecting their inputs 
to a valid logic level — in this case, ground. It would be equally valid to connect the inputs of unused 
gates to the positive supply voltage, +5 V. 

This logic circuit would work, but a more efficient solution is available to those who are familiar 
with the capabilities of the 7400 family. The 741 1 provides three 3-input AND gates, which is per- 
fect for this application, allowing a reduction in the part count to three ICs instead of four. This cir- 
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FIGURE 2.12 LED driver logic implementation. 

cuit is shown in Fig. 2.13 with alternative notation to illustrate varying circuit presentation styles. 
Rather than drawing gates as separate elements, the complete 7400-series ICs are shown as mono- 
lithic blocks. Either notation is commonly accepted and depends on the engineer’s preference. 



2.5 SYNCHRONOUS LOGIC DESIGN WITH THE 7400 FAMILY 



The preceding LED driver example shows how state-less logic (logic without flops and a clock) can 
be designed to implement an arbitrary logic equation. State-full logic is almost always required in a 
digital system, because it is necessary to advance one step at a time (one step each cycle) through an 
algorithm. Some 7400 ICs, such as counters, implement synchronous logic within the IC itself by 
combining Boolean logic gates and flops on the same die. Other 7400 ICs implement only flops that 
may be combined externally with logic to create the desired function. 

An example of a synchronous logic application is a basic serial communications controller. Serial 
communications is the process of taking parallel data, perhaps a byte of information, and transmit- 
ting or receiving that byte at a rate of one bit per clock cycle. The obvious downside of doing this is 
that it will take longer to transfer the byte, because it would be faster to just send the entire byte dur- 
ing the same cycle. The advantage of serial communications is a reduction in the number of wires re- 
quired to transfer information. Being able to string only a few wires between buildings instead of 
dozens usually compensates for the added serial transfer time. If the time required to serially transfer 
bits is too slow, the rate at which the bits are sent can be increased with some engineering work to 
achieve the desired throughput. Such speed improvements are beyond the scope of this presentation. 

Real serial communications devices can get fairly complicated. For purposes of discussion, a 
fairly simplistic approach is taken. Once the decision is made to serialize a data byte, the problem 
arises of knowing when that byte begins and ends. Framing is the process of placing special patterns 
into the data stream to indicate the start and end of data units. Without some means to frame the in- 
dividual bits as they are transmitted, the receiver would have no means of finding the first and last 
bits of each byte. In this example, a single start bit is used to mark the first bit. Once the first bit is 
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FIGURE 2.13 LED driver logic using 74111 with fewer ICs. 



detected, the last bit is found by knowing that there are eight bits in a byte. During periods of inactiv- 
ity, an idle communications interface is indicated by a persistent logic 0. When the transmitter is 
given a byte to send, it first drives a logic- 1 start bit and then sends eight data bits. Each bit is sent in 
its own clock cycle. Therefore, nine clock cycles are required to transfer each byte. The serial inter- 
face is composed of two signals, clock and serial data , and functions as shown in Fig. 2. 14. 

The eight data bits are sent from least-significant bit, bit 0, to most-significant bit, bit 7, following 
the start bit. Following the transmission of bit 7, it is possible to immediately begin a new byte by in- 
serting a new start bit. This timing diagram does not show a new start bit directly following bit 7. 
The corresponding output of the receiver is shown in Fig. 2.15. Here, data out is the eight-bit quan- 
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FIGURE 2.14 Serial interface bit timing. 
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FIGURE 2.15 Serial receive output timing. 
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tity that has been reconstructed from the serialized bit stream of Fig. 2.14. Ready indicates when 
data out is valid and is active-high. 

All that is required of this receiver is to assemble the eight data bits in their proper order and then 
generate a ready signal. This ready signal lasts only one cycle, and any downstream logic waiting for 
the newly arrived byte must process it immediately. In a real system, a register might exist to capture 
the received byte when ready goes active. This register would then pass the byte to the appropriate 
destination. This output timing shows two bytes transmitted back to back. They are separated by 
nine cycles, because each byte requires an additional start bit for framing. 

In contemplating the design of the receive portion of the serial controller, the need for a serial-in/ 
parallel-out shift register becomes apparent to assemble the individual bits into a whole byte. Addi- 
tionally, some control logic is necessary to recognize the start bit, wait eight clocks to assemble the 
incoming byte, and then generate a ready signal. This receiver has two basic states, or modes, of op- 
eration: idle and receiving. When idling, no start bit has yet been detected, so there is no useful work 
to be done. When receiving, a start bit has been observed, incoming bits are shifted into the shift reg- 
ister, and then a ready signal is generated. As soon as the ready signal is generated, the receiver state 
may return to idle or remain in receiving if a new start bit is detected. Because there are two basic 
control logic states, the state can be stored in a single flip-flop, forming a two-state finite state ma- 
chine (FSM). An FSM is formed by one or more state flops with accompanying logic to generate a 
new state for the next clock cycle based on the current cycle’s state. The state is represented by the 
combined value of the state flops. An FSM with two state flops can represent four unique states. 
Each state can represent a particular step in an algorithm. The accompanying state logic controls the 
FSM by determining when it is time to transition to a new piece of the algorithm — a new state. 

In the serial receive state machine, transitioning from idle to receiving can be done according to 
the serial data input, which is 0 when inactive and 1 when indicating a start bit. Transitioning back to 
idle must somehow be done nine cycles later. A counter could be used but would require some logic 
to sense a particular count value. Instead, a second shift register can be used to delay the start bit by 
nine cycles. When the start bit emerges from the last output bit in the shift register, the state machine 
can return to the idle state. Consider the logic in Fig. 2.16. The arrow-shaped boxes indicate connec- 
tion points, or ports, of the circuit. 

Under an idle condition, the input to the shift register is zero until the start bit appears at the data 
input, din. Nine cycles later, the ready bit emerges from the shift register. As soon as the start bit is 
observed, the state machine transitions to the receiving state, changing the idle input to 0, effectively 
masking further input to the shift register. This masking prevents nonzero data bits from entering the 
ready delay logic and causing false results. 

Delaying the start bit by nine cycles solves one problem but creates another. The transition of the 
state machine back to idle is triggered by the emergence of ready from the shift register. Therefore, 
this transition will actually occur ten cycles after the start bit, because the state flop, like all D flip- 
flops, requires a single cycle of latency to propagate its input to its output. This additional cycle will 
prevent the control logic from detecting a new start bit immediately following the last data bit of the 
byte currently in progress. A solution is to design ready with its nine-cycle delay and ready _next 
with an eight-cycle delay by tapping off one stage earlier in the shift register. In doing so, the state 
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FIGURE 2.16 Serial receive ready delay. 
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machine can look ahead one cycle into the future and return to idle in time for a new start bit that 
may be arriving. With the logical details of the state machine now complete, the state machine can 
be represented with the state transition diagram in Fig. 2.17. 

A state transition diagram, often called a bubble diagram, shows all the states of an FSM and the 
logical arcs that dictate how one state leads to another. When implemented, the arcs are translated 
into the state logic to make the FSM function. With a clearly defined state transition diagram, the 
logic to drive the state machine can be organized as shown in Table 2.3. 



TABLE 2.3 Serial Receive State Machine Logic Truth Table 



Current State 


din 


ready_next 


Next State 


i 


0 


X 


i 


i 


1 


X 


0 


0 


X 


0 


0 


0 


X 


1 


1 



When in the idle state (1), a high on din (the start bit) must be observed to transition to the receiv- 
ing state (0). Once in the receiving state, ready _next must be high to return to idle. This logic is rep- 
resented by the Boolean equation, 

Next = (State&Din) + (State&ready_next) 

As with most problems, there exists more than one solution. Depending on the components avail- 
able, one may choose to design the logic differently to make more efficient use of those components. 
As a general rule, it is desirable to limit the number of ICs used. The 745 1 provides two ‘ AND-OR- 
INVERT” gates, each of which implements the Boolean function, 



Y = AB + CD 

This function is tantalizingly close to what is required for the state machine. It differs in that the in- 
version of two inputs ( state and din) and a NOR function rather than an OR are necessary. Both dif- 
ferences can be resolved using a 7404 inverter IC, but there is a more efficient solution using the 
74175 quad flop. The 74175’s four flops each provide both true and inverted outputs. Therefore, a 
separate 7404 is not necessary. An inverted version of din can be obtained by passing din through a 
flip-flop before feeding the remainder of the circuit’s logic. For purposes of notation, we will refer to 
this “flopped” din as din Another flop will be used for the state machine. The inverted output of the 
state flop will compensate for the NOR vs. OR function of the 745 1 . A third flop will form the ninth 
bit of the ready delay shift register when combined with a 74164 eight-bit parallel-out shift register. 




FIGURE 2.17 Serial receive state machine. 
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Conveniently, the 74164 contains an internal AND gate at its input to implement the idle-enable of 
the start bit into the shift register. 

The total parts count for this serial receiver is four 7400-family ICs: two 74164 shift registers, one 
7451 AND-OR-INVERT, and one 74175 quad flop. One flop and one-half of the 7451 are unused in 
this application. Figure 2.18 shows how these ICs are connected to implement the serial receive 
logic. Note that a mixed-style of IC representation is used: most ICs are shown in a single block, but 
the 74175 is broken into separate flops for clarity. Even if an IC is represented as a single block, it is 
not necessary to draw the individual pins in the order in which they physically appear. As with the 
previous example, the graphical representation of logic depends on individual discretion. In addition 
to being functionally and electrically correct, a schematic diagram should be easy to understand. 

All synchronous elements, the shift registers and flops, are driven by an input clock signal, elk. 
The synchronous elements involved in the control path of the logic are also reset at the beginning of 
operation with the active-low reset_ signal. Reset_ is necessary to ensure that the state flop and the 
ready _next delay logic begin in an idle state when power is first applied. This is necessary, because 
flip-flops power up in a random, hence unknown, state. Once they are explicitly reset, they hold their 
state until the logic specifically changes their state. The shift register in the data path, U3, does not 
require a reset, because its contents are not used until eight valid data bits are shifted in, thereby 
flushing the eight bits with random power-up states. It would not hurt to connect U3’s clr_ pin to 
reset_, but this is not done to illustrate the option that is available. In certain logic implementations, 
adding reset capability to a flop may incur a penalty in terms of additional cost or circuit size. When 
a reset function is not free, it may be decided not to reset certain flops if their contents do not need to 
be guaranteed at power up, as is the case here. 
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FIGURE 2.18 Serial receive logic schematic diagram. 
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In this logic circuit, the inverted output of the state flop, U1B, is used as the state bit to compen- 
sate for the 745 l’s NOR function. The unused clr_ and b pins of U3 are connected to +5 V to render 
them neutral on the shift register’s behavior. The shift register will not clear itself, because clr_ is ac- 
tive-low and, similarly, the internal input AND-gate that combines a and b, will be logically by- 
passed by tying b to logic 1. The parallel byte output of this serial receiver is designated Dout[7:0] 
and is formed by grouping the eight outputs of the shift register into a single bus. One common nota- 
tion for assigning members of a bus is to connect each individual member to a thicker line with some 
type of bus-ripper line. The bus ripper is often drawn in the schematic diagram as mitered or curved 
at the bus end to make its function more visually apparent. 

Designing an accompanying serial transmitter follows a very similar design process to the preced- 
ing discussion. It is left as an exercise to the reader. 



2.6 COMMON VARIANTS OF THE 7400 FAMILY 



In the 1970s and 1980s, the 7400 family was commonly manufactured in a bipolar semiconductor 
process that operated using a +5-V power supply and was known as transistor-transistor logic (TTL). 
The discussion of the 7400 family thus far has included only the original +5-V bipolar type. The 
7400’s popularity and broad application to digital design has kept it relevant through many improve- 
ments in semiconductor process technology. As engineers learned to fabricate faster and more effi- 
cient ICs, the 7400 was redesigned in many different process generations beginning in the late 
1960s. Some of the more common 7400 variants are briefly discussed here. 

The original 7400 discrete TTL logic family featured typical propagation delays of 10 ns per gate 
and power consumption, also called power dissipation, of approximately 10 mW per gate. By mod- 
ern standards, the 7400’s speed is relatively slow, and its power dissipation is relatively high. In- 
creasing system complexity dictates deeper logic: more gates chained together to implement more 
complex Boolean functions. Each added level of logic adds at least another gate’s worth of propaga- 
tion delay. At the same time, power consumption also becomes a problem. Ten milliwatts may not 
sound like a lot of power, but, when multiplied by several thousand gates, it represents a substantial 
design problem in terms of both supplying a large quantity of power and cooling the radiated heat 
from digital systems. 

Two notable bipolar variants of the 7400 are the 74LS and 74F families. The 74LS, LS indicating 
low-power Schottky, has speed comparable to that of the original 7400, but it dissipates roughly 20 
percent of its power. The 74F, F indicating fast, is approximately 80 percent faster than the 7400 and 
reduces power consumption by almost half. Whether the concern is reducing power or increasing 
speed, these two families are useful for applications requiring 5-V bipolar technology. 

CMOS technology began to emerge in the 1980s as a popular process for fabricating digital ICs 
as a result of its lower power consumption as compared to bipolar. The low-power characteristics of 
CMOS logic stem from the fact that a FET requires essentially no current to keep it in an on or off 
state (unlike a BJT. which always draws some current when it is turned on). A CMOS gate, there- 
fore, will draw current only when it switches. For this reason, the power consumption of a CMOS 
logic gate is extremely low in an idle, or quiescent, state and increases with the frequency at which it 
switches. 

Several CMOS 7400 families were introduced, among them being the 74HCT and 74ACT, each 
of which has power consumption orders of magnitude less than bipolar equivalents at low frequen- 
cies. Earlier CMOS versions of the 7400 were not fully compatible with the bipolar devices, because 
of voltage threshold differences between the CMOS and bipolar processes. A typical TTL output is 
only guaranteed to rise above 2.5 V, depending on output loading. In contrast, a typical 5-V CMOS 
input requires a minimum level of around 3 V to guarantee detecting a logic 1 . This inconsistency in 
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voltage range causes a fundamental problem in which a TTL gate driving an ordinary CMOS gate 
cannot be guaranteed to operate in all situations. Both the 74HCT and 74ACT families possess the 
low-power benefits of CMOS technology and retain compatibility with bipolar ICs. A 74HCT device 
is somewhat slower than a 74LS equivalent, and the 74ACT is faster than a 74LS device. 

There has been an explosion of 7400 variants. Most of the families introduced in the last decade 
are based on CMOS technology and are tailored to a broad set of applications ranging from simple 
speed to high-power bus drivers. Most types of 7400 devices share common pin-outs and functions, 
with the exception of some proprietary specialized parts that may be produced by only a single man- 
ufacturer. Most of the 7400 families still require +5-V supplies, but lower voltages such as 3.3 V, 
2.5 V, 1.8 V, and 1.5 V are available as well. These lower- voltage families are important because of 
the general trend toward lower voltages for digital logic. 



2. 7 INTERPRETING A DIGITAL 1C DATA SHEET 



Semiconductor manufacturers publish data sheets for each of their products. Regardless of the spe- 
cific family or device, all logic IC data sheets share common types of information. Once the basic 
data sheet terminology and organization is understood, it is relatively easy to figure out other data 
sheets even when their exact terminology changes. Data sheet structure is illustrated using the 
74LS00 from Fairchild Semiconductor as an example. A page from its data sheet is shown in Fig. 
2.19. 

Digital IC data sheets should have at least two major sections: functional description and electri- 
cal specifications. The functional description usually contains the device pin assignment, or pin-out, 
as well as a detailed discussion of how the part logically operates. A simple IC such as the 74LS00 
will have a very brief functional description, because there is not much to say about a NAND gate's 
operation. More complex ICs such as microprocessors can have functional descriptions that fill doz- 
ens or hundreds of pages and are broken into many chapters. Some data sheets add additional sec- 
tions to present the mechanical dimensions of the package and its thermal properties. Digital IC 
electrical specifications are similar across most types of devices and often appear in the following 
four categories: 

• Absolute maximum ratings. As the term implies, these parameters specify the absolute extremes 
that the IC may be subjected to without sustaining permanent damage. Manufacturers almost uni- 
versally state that the IC should never be operated under these extreme conditions. These ratings 
are useful, because they indicate how the device may be stored and express the quality of design 
and manufacture of the physical chip. Manufacturers specify a storage temperature range within 
which the semiconductor structures will not break down. In the case of Fairchild's 74LS00, this 
range is -65 to 150°C. Maximum voltage levels are also specified, 7 V in the case of the 74LS00, 
indicating that the device may be subjected to a 7-V potential without destructing. 

• Recommended operating conditions. These parameters specify the normal range of voltages and 
temperatures that the IC should be operated within such that its functionality is guaranteed to meet 
specifications set forth by the manufacturer. Two of the most important specifications in this sec- 
tion are the supply voltage (commonly labeled as either V cc or V DD , depending on whether a bipo- 
lar or MOS process) and the operating temperature. An IC may have multiple supply voltage 
specifications, because an IC can actually operate on several different voltages simultaneously. 
Each supply voltage may power a different portion of the chip. When the manufacturer specifies 
supply voltage, it does so with a certain tolerance, usually either ±5 or ±10 percent. Many 5-V 
logic ICs are guaranteed to operate only at a supply voltage from 4.75 to 5.25 V (±5 percent). Op- 
erating temperature is very important, because it affects the timing of the device. As a semiconduc- 
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Note 1: The “Absolute Maximum Ratings” are those values beyond which 
the safety of the device cannot be guaranteed. The device should not be 
operated at these limits. The parametric values defined in the Electrical 
Characteristics tables are not guaranteed at the absolute maximum ratings. 
The “Recommended Operating Conditions" table will define the conditions 
for actual device operation. 



Recommended Operating Conditions 



Symbol 


Parameter 


Min 


Norn 


Max 


Units 


v cc 


Supply Voltage 


4.75 


5 


5.25 


V 


V,H 


HIGH Level Input Voltage 


2 






V 


V,L 


LOW Level Input Voltage 






0.8 


V 


'oh 


HIGH Level Output Current 






-0.4 


mA 


■OL 


LOW Level Output Current 






8 


mA 


Ta 


Free Air Operating Temperature 


0 




70 


°C 



Electrical Characteristics 

over recommended operating free air temperature range (unless otherwise noted) 



Symbol 


Parameter 


Conditions 


Min 


TVP 

(Note 2) 


Max 


Units 


V, 


Input Clamp Voltage 


Vqc 31 Min, i| = — 1 8 mA 






-1.5 


V 


V 0H 


HIGH Level 
Output Voltage 


V G c = Min, 'oh ■ Max, 
V, L = Max 


2.7 


3.4 




V 


V 0 L 


LOW Level 
Output Voltage 


Vcc = Min, Iql = Max, 
V| H = Min 




0.35 


0.5 


V 


Iql 4 mA, Vcc “Min 




0.25 


0.4 


l| 




V cc = Max, V, = 7V 






0.1 


mA 


I.H 


HIGH Level Input Current 


V cc - Max, V, = 2.7V 






20 


mA 


l|L 


LOW Level Input Current 


V cc = Max, V, = 0.4V 






-0.36 


mA 


•os 


Short Circuit Output Current 


Vcc * Max (Note 3) 


-20 




-100 


mA 


icCH 


Supply Current with Outputs HIGH 


Vcc “ Max 




0.8 


1.6 


mA 


ICCL 


Supply Current with Outputs LOW 1 


Vcc “ Max 




2.4 


4.4 


mA 



Note 2: All typicals are at V cc - 5V, T A - 25°C. 

Note 3: Not more than one output should be shorted at a time, and the duration should not exceed one second. 



Switching Characteristics 



at V cc = 5V and T A - 25°C 



Symbol 


Parameter 


R L . 2 kQ 


Units 


C L =15pF 


C L = 50 pF | 


Min 


Max 




Max 


tpLH 


Propagation Delay Time 
LOW-to-HIGH Level Output 


3 






15 


ns 


tpHL 


Propagation Delay Time 
HIGH-to-LOW Level Output 


3 




4 


15 


ns 



Absolute Maximum Ratings(Note d 

Supply Voltage 7V 

Input Voltage 7V 

Operating Free Air Temperature Range 0°C to +70°C 
Storage Temperature Range -65°C to +1 50°C 



FIGURE 2.19 74LS00 manufacturer's specifications. (Reprinted with permission from Fairchild Semiconductor and National 
Semiconductor. ) 
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tor heats up, it slows down. As it cools, its speed increases. Outside of the recommended operating 
temperature, the device is not guaranteed to function, because the effects of temperature become 
so severe that functionality is compromised. There are four common temperature ranges for ICs: 
commercial (0 to 70°C), industrial (-40 to 85°C), automotive (-40 to 125°C), and military (-55 to 
125°C). It is more difficult to manufacture an IC that operates over wider temperature ranges. As 
such, more demanding temperature grades are often more expensive than the commercial grade. 

Other parameters establish the safe operating limits for input signals as well as the applied volt- 
age thresholds that represent logic 0 and 1 states. Minimum and maximum input levels are ex- 
pressed as either absolute voltages or voltages relative to the supply voltage pins of the device. 
Exceeding these voltages may damage the device. Logic threshold specifications are provided to 
ensure that the logic input voltages are such that the device will function as intended and not con- 
fuse a 1 for a 0, or vice versa. There is also a limit to how must current a digital output can drive. 
Current output specifications should be known so that a chip is not overloaded, which could result 
in either permanent damage to the chip or the chip’s failure to meet its published specifications. 

• DC electrical characteristics. DC parameters specify the voltages and currents that the IC will 
present to other circuitry to which it is connected. Whereas recommended operating conditions 
specify the environment under which the chip will properly operate, DC electrical characteristics 
specify the environment that the chip itself will create. Output voltage specifications define the 
logic 0 and 1 thresholds that the chip is guaranteed to drive under all legal operating conditions. 
These specifications confirm that the chip is compatible with other chips in the same family and 
also allow an engineer to determine if the output levels are compatible with another chip that it 
may be driving. 

Input current specifications characterize the load that the chip presents to whatever circuit is 
driving it. When either logic state is applied to the chip, a small current flows between the driver 
and the chip in question. Quantifying these currents enables an engineer to ensure compatibility 
between multiple ICs. When one IC drives several other ICs, the sum of the input currents should 
not exceed the output current specification of the driver. 

• AC electrical characteristics or switching characteristics). AC parameters often represent the 
greatest complexity and level of detail in a digital IC’s specifications. They are the guaranteed 
timing parameters of inputs and outputs. If the IC is purely combinatorial (e.g., 74LS00), timing 
may just be matter of specifying propagation delays and rise and fall times. Logic ICs with syn- 
chronous elements (e.g.. flops) have associated parameters such as setup, hold, clock frequency, 
and output valid times. 

Keep in mind that each manufacturer has a somewhat different style of presenting these specifica- 
tions. The necessary information should exist, but data sheet sections may be named differently; they 
may include certain information in different groupings, and terminology may be slightly different. 

Specifications may be provided in mixed combinations of minimum, typical/nominal, and 
maximum. When a minimum or maximum limit is not specified, it is understood to be self-evi- 
dent or subject to a physical limitation that is beyond the scope of the device. Using Fairchild's 
74LS00 as an example, no minimum output current is specified, because the physical minimum 
is very near zero. The actual output current is determined by the load that is being driven, assum- 
ing that the load draws no more than the specified maximum. Other specifications are shown un- 
der certain operating conditions. A well written data sheet provides guaranteed specifications 
under worst-case conditions. Here, the logic 1 output voltage (V OH ) is specified as a minimum of 
2.5 V under conditions of minimum supply voltage (V cc ), maximum output current (I 0H ), an d 
maximum logic-low input voltage (V IL ). These are worst-case conditions. When V cc decreases, 
so will V OH . When I OH increases, it places a greater load on the output, dragging it down to its 
lowest level. 
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Timing specifications may also be incomplete. Manufacturers do not always guarantee minimum 
or maximum parameters, depending on the specific type of device and the particular specification. 
As with DC voltages, worst-case parameters should always be specified. When a minimum or maxi- 
mum delay is not specified, it is generally because that parameter is of secondary importance, and 
the manufacturer was unable to control its process to a sufficient level of detail to guarantee that 
value. In many situations where incomplete specifications are given, there are acceptable reasons for 
doing so, and the lack of information does not hurt the quality of the design. 

Typical timing numbers are not useful in many circumstances, because they do not represent a 
limit of the device’s operation. A thorough design must take into account the best and worst perfor- 
mance of each IC in the circuit so that one can guarantee that the circuit will function under all con- 
ditions. Therefore, worst-case timing parameters are usually the most important to consider first, 
because they are the dominant limit of a digital system’s performance in most cases. In more ad- 
vanced digital systems, minimum parameters can become equally as important because of the need 
to meet hold time and thereby ensure that a signal does not disappear too quickly before the driven 
IC can properly sense the signal’s logic level. 

Output timing specifications are often specified with an assumed set of loading conditions, be- 
cause the current drawn by the load has an impact on the output driver’s ability to establish a valid 
logic level. A small load will enable the IC to switch its output faster, because less current is de- 
manded of the output. A heavier load has the opposite effect, because it draws more current, which 
places a greater strain on the output driver. 




CHAPTER 3 

Basic Computer Architecture 



Microprocessors are central components of almost all digital systems, because combinations of 
hardware and software are used to solve design problems. A computer is formed by combining a mi- 
croprocessor with a mix of certain basic elements and customized logic. Software runs on a micro- 
processor and provides a flexible framework that orchestrates the behavior of hardware that has been 
customized to fit the application. When many people think about computers, images of desktop PCs 
and laptops come to their minds. Computers are much more diverse than the stereotypical image and 
permeate everyday life in increasing numbers. Small computers control microwave ovens, tele- 
phones, and CD players. 

Computer architecture is fundamental to the design of digital systems. Understanding how a basic 
computer is designed enables a digital system to take shape by using a microprocessor as a central 
control element. The microprocessor becomes a programmable platform upon which the major com- 
ponents of an algorithm can be implemented. Digital logic can then be designed to surround the mi- 
croprocessor and assist the software in carrying out a specific set of tasks. 

The first portion of this chapter explains the basic elements of a computer, including the micro- 
processor, memory, and input/output devices. Basic microprocessor operation is presented from a 
hardware perspective to show how instructions are executed and how interaction with other system 
components is handled. Interrupts, registers, and stacks are introduced as well to provide an overall 
picture of how computers function. Following this basic introduction is a complete example of how 
an actual eight-bit computer might be designed, with detailed descriptions of bus operation and ad- 
dress decoding. 

Once basic computer architecture has been discussed, common techniques for improving and 
augmenting microprocessor capabilities are covered, including direct memory access and bus expan- 
sion. These techniques are not relegated to high-end computing but are found in many smaller digital 
systems in which it is more economical to add a little extra hardware to achieve feature and perfor- 
mance goals instead of having to use a microprocessor that may be too complex and more expensive 
than desired. 

The chapter closes with an introduction to assembly language and microprocessor addressing 
modes. Writing software is not a primary topic of this book, but basic software design is an insepara- 
ble part of digital systems design. Without software, a computer performs no useful function. As- 
sembly language basics are presented in a general manner, because each microprocessor has its own 
instruction set and assembly language, requiring specific reading focused on that particular device. 
Basic concepts, however, are universal across different microprocessor implementations and serve to 
further explain how microprocessors actually function. 
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3. 1 THE DIGITAL COMPUTER 



A digital computer is a collection of logic elements that can execute arbitrary algorithms to perform 
data calculation and manipulation functions. A computer is composed of a microprocessor, memory, 
and some input/output (I/O) elements as shown in Fig. 3.1. The microprocessor, often called a mi- 
croprocessor unit (MPU) or central processing unit (CPU), contains logic to step through an algo- 
rithm, called a program, that has been stored in the computer’s program memory. The data used and 
manipulated by that program is held in the computer’s data memory. Memory is a repository for data 
that is usually organized as a linear array of individually accessible locations. The microprocessor 
can access a particular location in memory by presenting a memory address (the index of the desired 
location) to the memory element. I/O elements enable the microprocessor to communicate with the 
outside world to acquire new data and present the results of its programmed computations. Such ele- 
ments can include a keyboard or display controller. 

Programs are composed of many very simple individual operations, called instructions, that spec- 
ify in exact detail how the microprocessor should carry out an algorithm. A simple program may 
have dozens of instructions, whereas a complex program can have tens of millions of instructions. 
Collectively, the programs that run on microprocessors are called software, in contrast to the hard- 
ware on which they run. Each type of microprocessor has its own instruction set that defines the full 
set of unique, discrete operations that it is capable of executing. These instructions perform very nar- 
row tasks that, on their own, may seem insignificant. However, when thousands or millions of these 
tiny instructions are strung together, they may create a video game or a word processor. 

A microprocessor possesses no inherent intelligence or capability to spontaneously begin per- 
forming useful work. Each microprocessor is constructed with an instruction set that can be invoked 
in arbitrary sequences. Therefore, a microprocessor has the potential to perform useful work but will 
do nothing of the sort on its own. To make the microprocessor perform useful work, it requires ex- 
plicit guidance in the form of software programming. A task of even moderate complexity must be 
broken down into many tiny steps to be implemented on a microprocessor. These steps include basic 
arithmetic. Boolean operations, loading data from memory or an input element such as a keyboard, 
and storing data back to memory or an output element such as a printer. 

Memory structure is one of a computer’s key characteristics, because the microprocessor is al- 
most constantly accessing it to retrieve a new instruction, load new data to operate on, or store a cal- 
culated result. While program and data memory are logically distinct classifications, they may share 
the same physical memory resource. Random access memory (RAM) is the term used to describe a 
generic memory resource whose locations can be accessed, or addressed, in an arbitrary order and 
either read or written. A read is the process of retrieving data from a memory address and loading it 
into the microprocessor. A write is the process of storing data to a memory address from the micro- 
processor. Both programs and data can occupy RAM. Consider your desktop computer. When you 




FIGURE 3.1 Generic computer block diagram. 
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execute a program that is located on the disk drive, that program is first loaded into the computer’s 
RAM and then executed from a region set aside for program memory. As on a desktop computer, 
RAM is most often volatile — meaning that it loses its contents when the power is turned off. 

Some software cannot be stored in volatile memory, because basic initialization instructions, or 
boot code , must be present when the computer is turned on. Remember that a microprocessor can do 
nothing useful without software being readily available. When power is first applied to a computer, 
the microprocessor must be able to quickly locate boot code so that it can get itself ready to accept 
input from a user or load a program from an input device. This startup sequence is called booting , 
hence the term boot code. When you turn your computer on, the first messages that it displays on the 
monitor are a product of its boot code. Eventually, the computer is able to access its disk drive and 
begins loading software into RAM as part of its normal operation. To ensure that boot code is ready 
at power-up, nonvolatile memory called read only memory (ROM) exists. ROM can be used to store 
both programs as well as any data that must be present at power-up and immediately accessible. 
Software contained in ROM is also known as firmware. As its name implies, ROM can only be read 
but not written. More complex computers contain a relatively small quantity of ROM to hold basic 
boot code that then loads main operating software from another device into RAM. Small computers 
may contain all of their software in ROM. Figure 3.2 shows how ROM and RAM complement each 
other in a typical computer architecture. 

A microprocessor connects to devices such as memory and I/O via data and address buses. Col- 
lectively, these two buses can be referred to as the microprocessor bus. A bus is a collection of wires 
that serve a common purpose. The data bus is a bit array of sufficient size to communicate one com- 
plete data unit at a time. Most often, the data bus is one or more bytes in width. An eight-bit micro- 
processor, operating on one byte at time, almost always has an eight-bit data bus. A 32-bit 
microprocessor, capable of operating on up to 4 bytes at a time, can have a data bus that is 32, 16, or 
8 bits wide. The exact data bus width is implementation specific and varies according to the intended 
application of the microprocessor. A narrower bus width means that it will take more time to com- 
municate a quantity of data as compared to a wider bus. Common notation for a data bus is D[7:0] 
for an 8-bit bus and D[31:0] for a 32-bit bus, where 0 is the least-significant bit. 

The address bus is a bit array of sufficient size to fully express the microprocessor’s address 
space. Address space refers to the maximum amount of memory and I/O that a microprocessor can 
directly address. If a microprocessor has a 16-bit address bus, it can address up to 2 16 = 65,536 
bytes. Therefore, it has a 64 kB address space. The entire address space does not have to be used; it 
simply establishes a maximum limit on memory size. Common notation for a 16-bit address bus is 
A[15:0 ], where 0 is the least-significant bit. Figure 3.3 shows a typical microprocessor bus configu- 
ration in a computer. Note that the address bus is unidirectional (the microprocessor asserts re- 
quested addresses to the various devices), and the data bus is bidirectional (the microprocessor 
asserts data on a write and the devices assert data on reads). 




FIGURE 3.2 Basic ROM/RAM memory complement. 
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FIGURE 3.3 Microprocessor buses. 

A microprocessor’s entire address space is never occupied by a single function; rather, it is 
shared by ROM, RAM, and various I/Os. Each device is mapped into its own region of the address 
space and is enabled only when the microprocessor asserts an address within a device's mapped re- 
gion. The process of recognizing that an address is within a desired region is called decoding. Ad- 
dress decoding logic is used to divide the overall address space into smaller sections in which 
memory and I/O devices can reside. This logic generates individual signals that enable the appropri- 
ate device based on the state of the address bus so that the devices themselves do not need any 
knowledge of the specific computer’s unique address decoding. 



3.2 MICROPROCESSOR INTERNALS 



The multitude of complex tasks performed by computers can be broken down into sequences of sim- 
ple operations that manipulate individual numbers and then make decisions based on those calcula- 
tions. Certain types of basic instructions are common across nearly every microprocessor in 
existence and can be classified as follows for purposes of discussion: 

• Arithmetic: add or subtract two values 

• Logical: Boolean (e.g., AND, OR, XOR, NOT, etc.) manipulation of one or two values 

• Transfer: retrieve a value from memory or store a value to memory 

• Branch: jump ahead or back to a particular instruction if a specified condition is satisfied 

Arithmetic and logical instructions enable the microprocessor to modify and manipulate specific 
pieces of data. Transfer instructions enable these data to be saved for later use and recalled when 
necessary from memory. Branch operations enable instructions to execute in different sequences, de- 
pending on the results of arithmetic and logical operations. For example, a microprocessor can com- 
pare two numbers and take one of two different actions if the numbers are equal or unequal. 

Each unique instruction is represented as a binary value called an opcode. A microprocessor 
fetches and executes opcodes one at a time from program memory. Figure 3.4 shows a hypothetical 
microprocessor to serve as an example for discussing how a microprocessor actually advances 
through and executes the opcodes that form programs. 

A microprocessor is a synchronous logic element that advances through opcodes on each clock 
cycle. Some opcodes may be simple enough to execute in a single clock cycle, and others may take 
multiple cycles to complete. Clock speed is often used as an indicator of a microprocessor’s perfor- 
mance. It is a valid indicator but certainly not the only one, because each microprocessor requires a 
different number of cycles for each instruction, and each instruction represents a different quantity 
of useful work. 
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FIGURE 3.4 Simple microprocessor. 

When an opcode is fetched from memory, it must be briefly examined to determine what needs to 
be done, after which the appropriate actions are carried out. This process is called instruction decod- 
ing. A central logic block coordinates the operation of the entire microprocessor by fetching instruc- 
tions from memory, decoding them, and loading or storing any data as required. The accumulator is 
a register that temporarily holds data while it is being processed. Execution of an instruction to load 
the accumulator with a byte from memory would begin with a fetch of the opcode that represents 
this action. The instruction decoder would then recognize the opcode and initiate a memory read via 
the same microprocessor bus that was used to fetch the opcode. When the data returns from memory, 
it would be loaded into the accumulator. While there may be multiple distinct logical steps in decod- 
ing an instruction, the steps may occur simultaneously or sequentially, depending on the architecture 
of the microprocessor and its decoding logic. 

The accumulator is sized to hold the largest data value that the microprocessor can handle in a 
single arithmetic or logical instruction. When engineers talk of an 8-bit or 32-bit microprocessor, 
they are usually referring to the internal data-path width — the size of the accumulator and the arith- 
metic logic unit (ALU). The ALU is sometimes the most complex single logic element in a micro- 
processor. It is responsible for performing arithmetic and logical operations as directed by the 
instruction decode logic. Not only does the ALU add or subtract data from the accumulator, it also 
keeps track of status flags that tell subsequent branch instructions whether the result was positive, 
negative, or zero, and whether an addition or subtraction operation created a carry or borrow bit. 
These status bits are also updated for logical operations such as AND or OR so that software can 
take different action if a logical comparison is true or false. 

For ease of presentation, the microprocessor in Fig. 3.4 is shown having a single general-purpose 
accumulator register. Most real microprocessors contain more than one internal register that can be 
used for general manipulation operations. Some microprocessors have as few as one or two such 
registers, and some have dozens or more than a hundred. It is the concept of an accumulator that is 
discussed here, but there is no conceptual limitation on how many accumulators or registers a micro- 
processor can have. 

A microprocessor needs a mechanism to keep track of its place in the instruction sequence. Like a 
bookmark that saves your place as you read through a book, the program counter (PC) maintains the 
address of the next instruction to be fetched from program memory. The PC is a counter that can be 
reloaded with a new value from the instruction decoder. Under normal operation, the microprocessor 
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moves through instructions sequentially. After executing each instruction, the PC is incremented, 
and a new instruction is fetched from the address indicated by the PC. The major exception to this 
linear behavior is when branch instructions are encountered. Branch instructions exist specifically to 
override the sequential execution of instructions. When the instruction decoder fetches a branch in- 
struction, it must determine the condition for the branch. If the condition is met (e.g„ the ALU zero 
flag is asserted), the branch target address is loaded into the PC. Now, when the instruction decoder 
goes to fetch the next instruction, the PC will point to a new part of the instruction sequence instead 
of simply the next program memory location. 



3.3 SUBROUTINES AND THE STACK 



Most programs are organized into multiple blocks of instructions called subroutines rather than a 
single large sequence of instructions. Subroutines are located apart from the main program segment 
and are invoked by a subroutine call. This call is a type of branch instruction that temporarily jumps 
the microprocessor's PC to the subroutine, allowing it to be executed. When the subroutine has com- 
peted, control is returned to the program segment that called it via a return from subroutine instruc- 
tion. Subroutines provide several benefits to a program, including modularity and ease of reuse. A 
modular subroutine is one that can be relocated in different parts of the same program while still per- 
forming the same basic function. An example of a modular subroutine is one that sorts a list of num- 
bers in ascending order. This sorting subroutine can be called by multiple sections of a program and 
will perform the same operation on multiple lists. Reuse is related to modularity and takes the con- 
cept a step farther by enabling the subroutine to be transplanted from one program to another with- 
out modification. This concept greatly speeds the software development process. 

Almost all microprocessors provide inherent support for subroutines in their architectures and in- 
struction sets. Recall that the program counter keeps track of the next instruction to be executed and 
that branch instructions provide a mechanism for loading a new value into the PC. Most branch in- 
structions simply cause a new value to be loaded into the PC when their specific branch condition is 
satisfied. Some branch instructions, however, not only reload the PC but also instruct the micropro- 
cessor to save the current value of the PC off to the side for later recall. This stored PC value, or sub- 
routine return address, is what enables the subroutine to eventually return control to the program 
that called it. Subroutine call instructions are sometimes called branch- to- subroutine or jump-to- 
subroutine, and they may be unconditional. 

When a branch-to- subroutine is executed, the PC is saved into a data structure called a stack. The 
stack is a region of data memory that is set aside by the programmer specifically for the main pur- 
pose of storing the microprocessor’s state information when it branches to a subroutine. Other uses 
for the stack will be mentioned shortly. A stack is a last-in, first-out memory structure. When data is 
stored on the stack, it is pushed on. When data is removed from the stack, it is popped off. Popping 
the stack recalls the most recently pushed data. The first datum to be pushed onto the stack will be 
the last to be popped. A stack pointer (SP) holds a memory address that identifies the top of the stack 
at any given time. The SP decrements as entries are pushed on and increments at they are popped off, 
thereby growing the stack downward in memory as data is pushed on as shown in Fig. 3.5. 

By pushing the PC onto the stack during a branch-to-subroutine, the microprocessor now has a 
means to return to the calling routine at any time by restoring the PC to its previous value by simply 
popping the stack. This operation is performed by a return-from-subroutine instruction. Many mi- 
croprocessors push not only the PC onto the stack when calling a subroutine, but the accumulator 
and ALU status flags as well. While this increases the complexity of a subroutine call and return 
somewhat, it is useful to preserve the state of the calling routine so that it may resume control 
smoothly when the subroutine ends. 
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FIGURE 3.5 Generic stack operation. 



The stack can store multiple entries, enabling multiple subroutines to be active at the same time. 
If one subroutine calls another, the microprocessor must keep track of both subroutines’ return ad- 
dresses in the order in which the subroutines have been called. This subroutine nesting process of 
one calling another subroutine, which calls another subroutine, naturally conforms to the last-in, 
first-out operation of a stack. 

To implement a stack, a microprocessor contains a stack pointer register that is loaded by the pro- 
grammer to establish the initial starting point, or top, of the stack. Figure 3.6 shows the hypothetical 
microprocessor in more complete form with a stack pointer register. 

Like the PC, the SP is a counter that is automatically modified by certain instructions. Not only do 
subroutine branch and return instructions use the stack, there are also general-purpose push/pop in- 
structions provided to enable the programmer to use the stack manually. The stack can make certain 
calculations easier by pushing the partial results of individual calculations and then popping them as 
they are combined into a final result. 

The programmer must carefully manage the location and size of the stack. A microprocessor will 
freely execute subroutine call, subroutine return, push, and pop instructions whenever they are en- 
countered in the software. If an empty stack is popped, the microprocessor will oblige by reading 
back whatever data value is present in memory at the time and then incrementing the SP. If a full 
stack is pushed, the microprocessor will write the specified data to the location pointed to by the SP 
and then decrement it. Depending on the exact circumstances, either of these operations can corrupt 
other parts of the program or data that happens to be in the memory location that gets overwritten. It 
is the programmer’s responsibility to leave enough free memory for the desired stack depth and then 
to not nest too many subroutines simultaneously. The programmer must also ensure that there is 
symmetry between push/pop and subroutine call/return operations. Issuing a return-from-subroutine 




FIGURE 3.6 Microprocessor with stack pointer register. 
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instruction while already in the main program would lead to undesirable results when the micropro- 
cessor fetches reloads the PC with an incorrect return address. 



3.4 RESET AND INTERRUPTS 



Thus far, the steady-state operation of a microprocessor has been discussed in which instructions are 
fetched, decoded, and executed in an order determined by the PC and branch instructions. There are 
two special cases in which the microprocessor does not follow this regular pattern of operation. The 
first case is at power-up, when the microprocessor must transition from an idle state to executing in- 
structions. This transition sequence is called reset and involves the microprocessor fetching its boot 
code from memory to begin the programmed software sequence. Reset is triggered by asserting a 
particular logic level onto a microprocessor pin and can occur either at power-up or at any arbitrary 
time when it is desired to restart, or reboot, the microprocessor from a known initial state. Some mi- 
croprocessors have special instructions that can actually trigger a soft reset. 

The question arises of how the microprocessor determines which instruction to execute first when 
it has just been reset. To solve this problem, each microprocessor has a reset vector that points it to a 
fixed, predetermined memory address where the programmer must locate the first instruction of the 
boot sequence. The reset vector is specified by the microprocessor's designer. Some microprocessors 
locate the reset vector at the beginning of memory and some place it toward the end of the address 
space. Sometimes the main body of the program will be located in another portion of memory, and 
the first instruction at the reset vector will contain a branch instruction to jump to the desired loca- 
tion. 

The second case in which the microprocessor does not follow the normal instruction sequence is 
during normal operation when an event occurs and the programmer wishes the microprocessor to 
pause what it is currently doing and handle the event with a special software routine. Such an event 
is called an interrupt. A common application for an interrupt is the implementation of a periodic, 
timed operation such as monitoring the temperature of a room. Because the room temperature does 
not change often, the microprocessor can handle other tasks during normal operation. A timer can be 
set to expire every few seconds, causing an interrupt event. When the interrupt triggers, the micro- 
processor can read the room temperature, take any appropriate action (e.g., turn on a ventilation fan), 
and then resume its normal operation. 

An interrupt can be triggered by asserting a special-purpose microprocessor interrupt signal. In- 
terrupt events can also be triggered from within a microprocessor via special instructions. When an 
interrupt occurs, the microprocessor saves its state by pushing the PC and other registers onto the 
stack, and then the PC is loaded with an interrupt vector that points to an interrupt service routine 
(ISR) in memory. In this way, the interrupt process is similar to a branch-to-subroutine. However, 
the interrupt may be triggered by an external hardware event instead of by software. Like reset, each 
interrupt pin on the microprocessor has an interrupt vector associated with it. The programmer 
knows that an ISR is to be located at a specific memory location to service a particular interrupt. 
When the ISR has completed, a return-from-interrupt instruction is executed that restores the micro- 
processor’s prior state by popping it from the stack. Control is then returned to the routine that was 
interrupted and normal execution proceeds. 

As the interrupt mechanism executes, the program that gets interrupted does not necessarily have 
any knowledge of the event. Because the state of the microprocessor is saved and then restored dur- 
ing the return-from-interrupt, the main routine has no concept that somewhere along the way its exe- 
cution was paused for an arbitrary period. The programmer may choose to make such knowledge 
available by sharing information between the ISR and other routines, but this is left to individual 
software implementations. 
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Multiple interrupt sources are common in microprocessors. Depending on the complexity of the 
microprocessor, there may be one, two, ten, or dozens of separate interrupt sources, each with its 
own vector. Conflicts in which multiple interrupt sources are activated at the same time are handled 
by assigning priorities to each interrupt. Interrupt priorities may be predetermined by the designer of 
the microprocessor or programmed by software. In a microprocessor with multiple interrupt priori- 
ties, once a higher-priority interrupt has taken control and its ISR is executing, lower-priority inter- 
rupts will remain pending until the current higher-priority ISR issues a return-from-interrupt. 

Interrupts can usually be turned off, or masked , by writing to a control register within the micro- 
processor. Masking an interrupt is useful, because an interrupt should not be triggered before the 
program has had a chance to set up the ISR or otherwise get ready to handle the interrupt condition. 
If the program is not yet ready and the microprocessor takes an interrupt by jumping to the interrupt 
vector, the microprocessor will crash by executing invalid instructions. 

Masking is also useful when performing certain time-critical operations. A task may be pro- 
grammed into an ISR that must complete within 10 ps. Under normal circumstances, the task is eas- 
ily accomplished in this period of time. However, if a competing interrupt is triggered during the 
time-critical ISR, there may be no guarantee of meeting the 10-ps requirements. One solution to this 
problem is to mask subsequent interrupts when the time-critical interrupt is triggered and then un- 
mask interrupts when the ISR has completed. If an interrupt arrives while masked, the microproces- 
sor will remember the interrupt request and trigger the interrupt when it is unmasked. 

Certain microprocessors have one or more interrupts that are classified as nonmaskable. This 
means that the interrupt cannot be disabled. Therefore, the hardware design of the computer must 
ensure that such an interrupt is not activated unless the software is able to respond to it. Non- 
maskable interrupts are generally used for low-level error recovery or debugging purposes where it 
must be guaranteed that the interrupt will be taken regardless of what the microprocessor is doing at 
the time. Nonmaskable ISRs are sometimes implemented in nonvolatile memory to ensure that they 
are always ready for execution. 



3.5 IMPLEMENT A TION OF AN EIGHT-BIT COMPUTER 



Having discussed some of the basic principles of microprocessor architecture and operation, we can 
examine how a microprocessor fits into a system to form a computer. Microprocessors need external 
memory in which to store their programs and the data upon which they operate. In this context, ex- 
ternal memory is viewed from a logical perspective. That is, the memory is always external to the 
core microprocessor element. Some processor chips on the market actually contain a certain quantity 
of memory within them, but, logically speaking, this memory is still external to the actual micropro- 
cessor core. 

In the general sense, a computer requires a quantity of nonvolatile memory, or ROM, in which to 
store the boot code that will be executed on reset. The ROM may contain all or some of the micro- 
processor’s full set of software. A small embedded computer, such as the one in a microwave oven, 
contains all its software in ROM. A desktop computer contains very little of its software in ROM. A 
computer also requires a quantity of volatile memory, or RAM, that can be used to store data associ- 
ated with the various tasks running on the computer. RAM is where the microprocessor’s stack is lo- 
cated. Additionally, RAM can be used to hold software that is loaded from an external source. 

For purposes of discussion, consider the basic eight-bit computer shown in Fig. 3.7 with a small 
quantity of memory and a serial port with which to communicate with the outside world. Eight kilo- 
bytes of ROM is sufficient to store boot code and software, including a serial communications pro- 
gram. Eight kilobytes of RAM is sufficient to hold data associated with the ROM software, and it 
also enables loading additional software not already included in the ROM. The control signals in this 
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FIGURE 3.7 Eight-bit computer block diagram. 
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hypothetical computer are active-low, as are the control signals in most computer designs that, ac- 
cording to convention, have been in widespread use for the past few decades. Active-low signal 
names have some type of symbol as a prefix or suffix to the signal name that distinguishes them from 
active-high signals. Common symbols used for this purpose include #, *, and From a logical 
perspective, it is perfectly valid to use active-high signaling. However, because most memory and 
peripheral devices conform to the active-low convention, it is often easier to go along with the estab- 
lished convention. 

While hypothetical, the microprocessor shown contains characteristics that are common in off- 
the-shelf eight-bit microprocessors. It contains an 8-bit data bus and a 16-bit address bus with a total 
address space of 64 kB. The combined MPU bus, consisting of address, data, and control signals, is 
asynchronous and is enabled by the assertion of read and write enable signals. When the micropro- 
cessor wants to read a location in memory, it asserts the appropriate address along with RD* and 
then takes the resulting value driven onto the data bus. As shown in the diagram, memory chips usu- 
ally have output enable (OE*) signals that can be connected to a read enable. Such devices continu- 
ously decode the address bus and will emit data whenever OE* is active. 

Not all 64 kB of address space is used in this computer. Address decoding logic breaks the single 
64-kB space into four 16-kB regions. According to the state of A[15: 14], one and only one of the 
chip select signals is activated. The address decoding follows the truth table shown in Table 3.1 and 
establishes four address ranges. 

Once decoded into regions, A[13:0] provides unique address information to the memory and I/O 
devices connected to the MPU bus. One memory region, the upper 16 kB, is currently left unused. It 
may be used in the future if more memory or another I/O device is added. Each memory and I/O de- 
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TABLE 3.1 Address Decoding Truth Table 



A[ 1 5] 


A[ 1 4] 


Chip Select 


Address Range 


0 


0 


cso* 


0x0000-0x3FFF 


0 


1 


CS1* 


0x4000-0x7FFF 


1 


0 


CS2* 


0x8000-OxBFFF 


1 


1 


none 


OxCOOO-OxFFFF 



vice has a chip select input and will respond to a read or write command only when that select signal 
is active. Furthermore, each chip, including the microprocessor, contains internal tri-state buffers to 
prevent contention on the bus. The tri-state buffers are not enabled unless the chip's select signal is 
active and a read is being performed (a write, in the case of the microprocessor). Without external 
address decoding, none of these chips can share an address region with any other devices, because 
they do not have enough address bits to fully decode the entire 16-bit address bus. 

Not all address bits are used by the memory and serial port chips. The ROM and RAM are each 
only 8k in size. Therefore, only 13 address bits, A[12:0], are required and, as a result, A[ 13] is left 
unconnected. The serial port has far fewer memory locations and therefore uses only A[3:0], for a 
maximum of 16 unique addresses. 

When a device does not utilize all of the address bits that have been allocated for its particular ad- 
dress region, the potential for aliasing exists. The ROM occupies only 8k (13 bits) of the 16k (14 
bits) address region. Therefore, the ROM has no knowledge of any additional addresses above 8k: 
the region from 0x2000 to 0x3FFFF. What happens if the MPU tries to read location 0x2000? 
0x2000 differs from 0x0000 only in the state of A[13]. Because the ROM does not have any knowl- 
edge of A[13], it interprets 0x2000 to be 0x0000. In other words, 0x2000 aliases to 0x0000. Simi- 
larly, the entire upper 8k of the address region aliases to the lower 8k. In the case of the serial port 
controller, there is a greater degree of aliasing, because the serial port only uses A[3:0]. This means 
that there can be only 16 unique address locations in the entire 16k region. These 16 locations will 
therefore appear to be replicated 2 10 = 1,024 times as indicated by the ten unused address bits, 
A[ 13:4] . 

As long as the software is properly written to understand the computer's memory map, it will 
properly access the memory locations that are available and will avoid aliased portions of the mem- 
ory map. Aliasing is not a problem in itself but can lead to problems if software does not access 
memory and peripherals in the way in which the hardware engineer intended. If software is written 
for the hypothetical computer with the incorrect assumption that 16 kB of RAM is present, data may 
be unwittingly corrupted when addresses between 0x6000 and 0x7FFF are written, because they will 
alias to 0x4000-0x5FFF and overwrite any existing data. 

When the MPU wants to read data from a particular memory location, it asserts that address onto 
A[15:0]. This causes the address decoder to update its chip select outputs, which enables the appro- 
priate memory chip or the serial port. After allowing time for the chip select to propagate, the RD* 
signal is asserted, and the WR* signal is left unasserted. This informs the selected device that a read 
is requested. The device is then able to drive the data bus, D[7:0], with the requested data. After al- 
lowing some time for the read data to be driven, the MPU captures the data and releases the RD* sig- 
nal, ending the read request. The sequence of events, or timing, for the read transaction is shown in 
Fig. 3.8. 

This type of MPU bus is asynchronous, because its sequence of events is not driven by a clock but 
rather by the assertion and removal of the various signals that are timed relative to one another by the 
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FIGURE 3.8 MPU read timing. 



MPU and the devices with which it is communicating. For this interface to work properly, the MPU 
must allow enough time for the read to occur, regardless of the specific device with which it is com- 
municating. In other words, it must operate according to the capabilities of the slowest device — the 
least common denominator. 

Write timing is very similar, as seen in Fig. 3.9. Again, the MPU asserts the desired address onto 
A[15:0], and the appropriate chip select is decoded. At the same time, the write data is driven onto 
D[7:0], Once the address and data have had time to stabilize, and after allowing time for the chip se- 
lect to propagate, the WR* enable signal is asserted to actually trigger the write. The WR* signal is 
de-asserted while data, address, and chip select are still stable so that there is no possibility of writ- 
ing to a different location and corrupting data. If the WR* signal is de-asserted at the same time as 
the others, a race condition could develop wherein a particular device may sense the address (or data 
or chip select) change just prior to WR* changing, resulting in a false write to another location or to 
the current location with wrong data. Being an asynchronous interface, the duration of all signal as- 
sertions must be sufficient for all devices to properly execute the write. 

An MPU interrupt signal is asserted by the serial port controller to enable easier programming of 
the serial port communication routine. Rather than having software continually poll the serial port to 
see if data are waiting, the controller is configured to assert INTR* whenever a new byte arrives. The 
MPU is then able to invoke an ISR, which can transfer the data byte from the serial port to the RAM. 
The interrupt also helps when transmitting data, because the speed of the typical serial port (often 
9,600 to 38,400 bps) is very slow as compared to the clock speed of even a slow MPU (1 to 
10 MHz). When the software wants to send a set of bytes out the serial port, it must send one byte 
and then wait a relatively long time until the serial port is ready for the next byte. Instead of polling 
in a loop between bytes, the serial port controller asserts INTR* when it is time to send the next 
byte. The ISR can then respond with the next byte and return control to the main program that is run- 
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FIGURE 3.9 MPU write timing. 
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ning at the time. Each time INTR* is asserted and the ISR responds, the ISR must be sure to clear 
the interrupt condition in the serial port. Depending on the exact serial port device, a read or write to 
a specific register will clear the interrupt. If the interrupt is not cleared before the ISR issues a return- 
from-interrupt, the MPU may be falsely interrupted again for the same condition. 

This computer contains two other functional elements: the clock and reset circuits. The 1-MHz 
clock must be supplied to the MPU continually for proper operation. In this example design, no 
other components in the computer require this clock. For fairly simple computers, this is a realistic 
scenario, because the buses and memory devices operate asynchronously. Many other computers, 
however, have synchronous buses, and the microprocessor clock must be distributed to other compo- 
nents in the system. 

The reset circuit exists to start the MPU when the system is first turned on. Reset must be applied 
for a certain minimum duration after the power supply has stabilized. This is to ensure that the digi- 
tal circuits properly settle to known states before they are released from reset and allowed to begin 
normal operation. As the computer is turned on, the reset circuit actively drives the RST* signal. 
Once power has stabilized, RST* is de-asserted and remains in this state indefinitely. 



3.6 ADDRESS BANKING 



A microprocessor’s address space is normally limited by the width of its address bus, but supple- 
mental logic can greatly expand address space, subject to certain limitations. Address banking is a 
technique that increases the amount of memory a microprocessor can address. If an application re- 
quires 1 MB of RAM for storing large data structures, and an 8-bit microprocessor is used with a 
64-kB address space, address banking can enable the microprocessor to access the full 1 MB one 
small section at a time. 

Address banking, also known as paging, takes a large quantity of memory, divides it into multiple 
smaller banks, and makes each bank available to the microprocessor one at a time. A bank address 
register is maintained by the microprocessor and determines which bank of memory is selected at 
any given time. The selected bank is accessed through a portion of the microprocessor’s fixed ad- 
dress space, called a window, set aside for banked memory access. As shown in Fig. 3.10a, the upper 
16 kB of address space provides direct access to one of many 16-kB pages in the larger banked 
memory structure. Figure 3.10b shows the logical implementation of this banked memory scheme. A 
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FIGURE 3.10 Address banking. 
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22-bit combined address is sent to the 4-MB banked memory structure: 256 pages x 16 kB per page 
= 4 MB. These 22 bits are formed through the concatenation of the 8-bit bank address register and 
14 of the microprocessor’s low-order address bits, A[13:0]. The eight bank-address bits are changed 
infrequently whenever the microprocessor is ready for a new page in memory. The 14 microproces- 
sor-address bits can change each time the window is accessed. 

The details of a banking scheme can be modified according to the application’s requirements. The 
bank access window can be increased or decreased, and more or fewer pages can be defined. If an 
application operates on many small sets of data, a larger number of smaller pages may be suitable. If 
the data or software set is widely dispersed, it may be better to increase the window size as much as 
possible to minimize the bank address register update rate. 

While address banking can greatly increase the memory available to a microprocessor, it does so 
with the penalties of increased access time on page switches and more complexity in managing the 
segmented address space. Each time the microprocessor wants to access a location in a different 
page, it must update the bank address register. This penalty is acceptable in some applications. How- 
ever, if the application requires both consistently fast access time and large memory size, a faster, 
more expensive microprocessor may be required that suits these needs. 

The complexity of managing the segmented address space dissuades some engineers from em- 
ploying address banking. Software usually bears the brunt of recognizing when necessary data re- 
sides in a different page and then updating the bank address register to access that page. It is easier 
for software to deal with a large, continuous address space. With the easy availability and low cost of 
32-bit microprocessors, address banking is not as common as it used to be. However, if an 8-bit mi- 
croprocessor must be used for cost reduction or other limitations, address banking may be useful 
when memory demands increase beyond 64 kB. 



3.7 DIRECT MEMORY ACCESS 



Transferring data from one region of memory to another is a common task performed within a com- 
puter. Incoming data may be transferred from a serial communications controller into memory, and 
outgoing data may be transferred from memory to the controller. Memory-to-memory transfers are 
common, too, as data structures are moved between subprograms, each of which may have separate 
regions of memory set aside for its private use. The speed with which memory is transferred nor- 
mally depends on the time that the microprocessor takes to perform successive read and write opera- 
tions. Each byte transferred requires several microprocessor operations: load accumulator, store 
accumulator, update address for next byte, and check if there is more data. Instead of simply moving 
a stream of bytes without interruption, the microprocessor is occupied mostly by the overhead of 
calculating new addresses and checking to see if more data is waiting. Computers that perform a 
high volume of memory transfers may exhibit performance bottlenecks as a result of the overhead of 
having the microprocessor spend too much of its time reading and writing memory. 

Memory transfer performance can be improved using a technique called direct memory access, or 
DMA. DMA logic intercedes at the microprocessor’s request to directly move data between a source 
and destination. A DMA controller (DMAC) sits on the microprocessor bus and contains logic that is 
specifically designed to rapidly move data without the overhead of simultaneously fetching and de- 
coding instructions. When the microprocessor determines that a block of data is ready to move, it 
programs the DMAC with the starting address of the source data, the number of bytes to move, and 
the starting address of the destination data. When the DMAC is triggered, the microprocessor tem- 
porarily relinquishes control of its bus so the DMAC can take over and quickly move the data. The 
DMAC serves as a surrogate processor by directly generating addresses and reading and writing 
data. From the microprocessor bus perspective, nothing has changed, and data transfers proceed nor- 
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mally despite being controlled by the DMAC rather than the microprocessor. Figure 3.1 1 shows the 
basic internal structure of a DMAC. 

A DMA transfer can be initiated by either the microprocessor or an I/O device that contains logic 
to assert a request to the DMAC. DMA transfers are generally broken into two categories: periph- 
eral/memory and memory/memory. Peripheral/memory transfers move data to a peripheral or re- 
trieve data from a peripheral. A peripheral/memory transfer can be triggered by a DMA-aware I/O- 
device when it is ready to accept more outgoing data or incoming data has arrived. These are called 
single-address transfers, because the DMAC typically controls only a single address — that of the 
memory side of the transfer. The peripheral address is typically a fixed offset into its register set and 
is asserted by supporting control logic that assists in the connectivity between the peripheral and the 
DMAC. 

DMA transfers do not have to be continuous, and they are often not in the case of a peripheral 
transfer. If the microprocessor sets up a DMA transfer from a serial communications controller to 
memory, it programs the DMAC to write a certain quantity of data into memory. However, the trans- 
fer does not begin until the serial controller asserts a DMA request indicating that data is ready. 
When this request occurs, the DMAC arbitrates for access to the microprocessor bus by asserting a 
bus request. Some time later, the microprocessor or its support logic will grant the bus to the DMAC 
and temporarily pause the microprocessor's bus activity. The DMAC can then transfer a single unit 
of data from the serial controller into memory. The unit of data transfer may be any number of bytes. 
When finished, the DMAC relinquishes control of the bus back to the microprocessor. 

Memory/memory transfers move data from one region in memory to another. These are called 
dual-address transfers, because the DMAC controls two addresses into memory — source and desti- 
nation. Memory/memory transfers are triggered by the microprocessor and can execute continu- 
ously, because the data block to be moved is ready and waiting in memory. 

Even when DMA transfers execute one byte at a time, they are still more efficient than the micro- 
processor, because the DMAC is capable of transferring a byte or word (per the microprocessor’s 
data bus width) in a single bus cycle rather than the microprocessor’s load/store mechanism with ad- 
ditional overhead. There is some initial overhead in setting up the DMA transfer, so it is not efficient 
to use DMA for very short transfers. If the microprocessor needs to move only a few bytes, it should 
probably do so on its own. However, the DMAC initialization overhead is more than compensated 
for if dozens or hundreds of bytes are being moved. 
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FIGURE 3.1 1 DMA controller block diagram. 
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A typical DMAC supports multiple channels, each of which controls a different DMA transfer. 
While only one transfer can execute at any given moment, multiple transfers can be interleaved to 
prevent one peripheral from being starved for data while another is being serviced. Because a typical 
peripheral transfer is not continuous, each DMA channel can be assigned to each active peripheral. 
A DMAC can have one channel configured to load incoming data from a serial controller, another to 
store data to a disk drive controller, and a third to move data from one region of memory to another. 
Once initialized by the microprocessor, the exact order and interleaving of multiple channels is re- 
solved by the individual DMA request signals, and any priority information is stored in the DMAC. 

When a DMAC channel has completed transferring the requested quantity of data, the DMAC as- 
serts an interrupt to the microprocessor to signal that the data has been moved. At this point, the mi- 
croprocessor can restart a new DMA transfer if desired and invoke any necessary routines to process 
data that has been moved. 

External DMA support logic may be necessary, depending on the specific DMAC, microproces- 
sor, and peripherals that are being used. Some microprocessors contain built-in DMAC arbitration 
logic. Some peripherals contain built-in DMA request logic, because they are specifically designed 
for these high-efficiency memory transfers. Custom arbitration logic typically functions by waiting 
for the DMAC to request the bus and then pausing the microprocessor’s bus transfers until the 
DMAC relinquishes the bus. This pause operation is performed according to the specifications of the 
particular microprocessor. Custom peripheral control logic can include DMAC read/write interface 
logic to assert the correct peripheral address when a transfer begins and perform any other required 
mapping between the DMAC’s transfer enable signaling and the peripheral’s read/write interface. 



3.8 EXTENDING THE MICROPROCESSOR BUS 



A microprocessor bus is intended to directly connect to memory and I/O devices that are in close 
proximity to the microprocessor. As such, its electrical and functional properties are suited for rela- 
tively short interconnecting wires and relatively simple device interfaces that respond with data soon 
after the microprocessor issues a request. Many computers, however, require some mechanism to ex- 
tend the microprocessor bus so that additional hardware, such as plug-in expansion cards or memory 
modules, can enhance the system with new capabilities. Supporting these modular extensions to the 
computer’s architecture can be relatively simple or quite complex, depending on the required degree 
of expandability and the physical distances across which data must be communicated. 

Expansion buses are generally broken into two categories, memory and I/O, because these 
groups’ respective characteristics are usually quite different. General-purpose memory is a high- 
bandwidth resource to which the microprocessor requires immediate access so that it can maintain a 
high level of throughput. Memory is also a predictable and regular structure, both logically and 
physically. If more RAM is added to a computer, it is fairly certain that some known number of 
chips will be required for a given quantity of memory. In contrast, I/O by nature is very diverse, and 
its bandwidth requirements are usually lower than that of memory. I/O expansion usually involves 
cards of differing complexity and architecture as a result of the wide range of interfaces that can be 
supported (e.g., disk drive controller versus serial port controller). Therefore, an I/O expansion bus 
must be flexible enough to interface with a varying set of modules, some of which may not have 
been conceived of when the computer is first designed. 

Memory expansion buses are sometimes direct extensions of the microprocessor bus. From the 
preceding 8-bit computer example, the upper 16 kB of memory could be reserved for future expan- 
sion. A provision for future expansion could be as simple as adding a connector or socket for an ex- 
tra memory chip. In this case, no special augmentation of the microprocessor bus is required. 
However, in a larger system with more address space, provisions must be made for more than one 
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additional memory chip. In these situations, a simple buffered extension of the microprocessor bus 
may suffice. A buffer, in this context, is an IC that passes data from one set of pins to another, 
thereby electrically separating two sections of a bus. As shown in Fig. 3.12, a buffer can extend a mi- 
croprocessor bus so that its logical functionality remains unchanged, but its electrical characteristics 
are enhanced to provide connectivity across a greater distance (to a multichip memory expansion 
module). A unidirectional address buffer extends the address bus from the microprocessor to expan- 
sion memory devices. A bidirectional data buffer extends the bus away from the microprocessor on 
writes and toward the microprocessor on reads. The direction of the data buffer is controlled accord- 
ing to the state of read/write enable signals generated by the microprocessor. 

More complex memory structures may contain dedicated memory control logic that sits between 
the microprocessor and the actual memory devices. Expanding such a memory architecture is gener- 
ally accomplished by augmenting the “back-side” memory device bus as shown in Fig. 3.13 rather 
than by adding additional controllers onto an extended microprocessor bus. Such an expansion 
scheme may or may not require buffers, depending on the electrical characteristics of the bus in 
question. 

I/O buses may also be direct extensions of the microprocessor bus. The original expansion bus in 
the IBM PC, developed in the early 1980s, is essentially an extended Intel 8088 microprocessor bus 
that came to be known as the Industry Standard Architecture (ISA) bus. Each I/O card on the ISA 
bus is mapped in a unique address range in the microprocessor’s memory. Therefore, when software 
wants to read or write a register on an I/O card, it simply performs an access to the desired location. 
The ISA bus added a few features beyond the raw 8088 bus, including DMA and variable wait states 
for slow I/O devices. A wait state results when a device cannot immediately respond to the micro- 
processor’s request and asserts a signal to stretch the access so that it can respond properly. 




FIGURE 3.12 Buffered microprocessor bus for memory expansion. 




FIGURE 3.13 Extended memory controller bus. 
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Direct extensions such as the ISA bus are fairly easy to implement and serve well in applications 
where I/O response time does not unduly restrict microprocessor throughput. As computers have 
gotten faster, the throughput of microprocessors has rapidly outstripped the response times of all but 
the fastest I/O devices. In comparison to a modem microprocessor, a hard-disk controller is rather 
slow, with response times measured in microseconds rather than nanoseconds. Additionally, as bus 
signals become faster, the permissible length of interconnecting wires decreases, limiting their ex- 
pandability. These and other characteristics motivate the decoupling of the microprocessor’s local 
bus from the computer’s I/O bus. 

An I/O bus can be decoupled from the microprocessor bus by inserting an intermediate bus con- 
troller between them that serves as an interface, or translator, between the two buses. Once the buses 
are separated, activity on one bus does not necessarily obstruct activity on the other. If the micropro- 
cessor wants to write a block of data to a slow device, it can rapidly transfer that data to the bus con- 
troller and then continue with other operations at full speed while the controller slowly transfers the 
data to the I/O device. This mechanism is called a posted-write, because the bus controller allows the 
microprocessor to complete, or post, its write before the write actually completes. Separate buses 
also open up the possibility of multiple microprocessors or logic elements performing I/O operations 
without conflicting with the central microprocessor. In a multimaster system, a specialized DMA 
controller can transfer data between two peripherals such as disk controllers while the microproces- 
sor goes about its normal business. 

The Peripheral Component Interconnect (PCI) bus is the industry-standard follow-on to the ISA 
bus, and it implements such advanced features as posted-writes, multiple-masters, and multiple bus 
segments. Each PCI bus segment is separated from the others via a PCI bridge chip. Only traffic that 
must travel between buses crosses a bridge, thereby reducing congestion on individual PCI bus seg- 
ments. One segment can be involved in a data transfer between two devices without affecting a si- 
multaneous transfer between two other devices on a different segment. These performance- 
enhancing features do not come for free, however. Their cost is manifested by the need for dedicated 
PCI control logic in bridge chips and in the I/O devices themselves. It is generally simpler to imple- 
ment an I/O device that is directly mapped into the microprocessor's memory space, but the overall 
performance of the computer may suffer under demanding applications. 



3.9 ASSEMBLY LANGUAGE AND ADDRESSING MODES 



With the hardware ready, a computer requires software to make it more than an inactive collection of 
components. Microprocessors fetch instructions from program memory, each consisting of an op- 
code and, optionally, additional operands following the opcode. These opcodes are binary data that 
are easy for the microprocessor to decode, but they are not very readable by a person. To enable a 
programmer to more easily write software, an instruction representation called assembly language 
was developed. Assembly language is a low-level language that directly represents each binary op- 
code with a human-readable text mnemonic. For example, the mnemonic for an unconditional 
branch-to-subroutine instruction could be BSR. In contrast, a high-level language such as C++ or 
Java contains more complex logical expressions that may be automatically converted by a compiler 
to dozens of microprocessor instructions. Assembly language programs are assembled, rather than 
compiled, into opcodes by directly translating each mnemonic into its binary equivalent. 

Assembly language also makes programming easier by enabling the usage of text labels in place 
of hard-coded addresses. A subroutine can be named FOO, and when BSR FOO is encountered by 
the assembler, a suitable branch target address will be automatically calculated in place of the label 
FOO. Each type of assembler requires a slightly different format and syntax, but there are general as- 
sembly language conventions that enable a programmer to quickly adapt to specific implementations 
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once the basics are understood. An assembly language program listing usually has three columns of 
text followed by an optional comment column as shown in Fig. 3.14. The first column is for labels 
that are placeholders for addresses to be resolved by the assembler. Instruction mnemonics are lo- 
cated in the second column. The third column is for instruction operands. 

This listing uses the Motorola 6800 family's assembly language format. Though developed in the 
1970s, 68xx microprocessors are still used today in embedded applications such as automobiles and 
industrial automation. The first line of this listing is not an instruction, but an assembler directive 
that tells the assembler to locate the program at memory location $100. When assembled, the listing 
is converted into a memory dump that lists a range of memory addresses and their corresponding 
contents — opcodes and operands. Assembler directives are often indicated with a period prefix. 

The program in Fig. 3.14 is very simple: it counts to 30 ($1E) and then sends the “Z” character 
out the serial port. It continues in an infinite loop by returning to the start of the program when the 
serial port routine has completed its task. The subroutine to handle the serial port is not shown and is 
referenced with the SEND_CHAR label. The program begins by clearing accumulator A (the 6800 
has two accumulators: ACCA and ACCB ). It then enters an incrementing loop where the accumula- 
tor is incremented and then compared against the terminal count value, $1E. The # prefix tells the as- 
sembler to use the literal value $ IE for the comparison. Other alternatives are possible and will soon 
be discussed. If ACCA is unequal to $1E, the microprocessor goes back to increment ACCA. If 
equal, the accumulator is loaded with the ASCII character to be transmitted, also a literal operand. 
The assumption here is that the SEND_CHAR subroutine transmits whatever is in ACCA. When the 
subroutine finishes, the program starts over with the branch-always instruction. 

Each of the instructions in the preceding program contains at least one operand. CLRA and INCA 
have only one operand: ACCA. CMPA and LDAA each have two operands: ACCA and associated 
data. Complex microprocessors may reference three or more operands in a single instruction. Some 
instructions can reference different types of operands according to the requirements of the program 
being implemented. Both CMPA and LDAA reference literal operands in this example, but a pro- 
grammer cannot always specify a predetermined literal data value directly in the instruction sequence. 

Operands can be referenced in a variety of manners, called addressing modes, depending on the 
type of instruction and the type of operand. Some types of instructions inherently use only one ad- 
dressing mode, and some types have multiple modes. The manners of referencing operands can be 
categorized into six basic addressing modes: implied, immediate, direct, relative, indirect, and in- 
dexed. To fully understand how a microprocessor works, and to efficiently utilize an instruction set, 
it is necessary to explore the various mechanisms used to reference data. 

• Implied addressing specifies the operand of an instruction as an inherent property of that instruc- 
tion. For example, CLRA implies the accumulator by definition. No additional addressing infor- 
mation following the opcode is needed. 





.ORIG 


$100 




BEGIN 


CLRA 






INC_LOOP 


INCA 








CMPA 


#$1E 


; compare ACCA = $1E 




BNE 


INC_LOOP 


; if not equal, go back 




LDAA 


#'Z’ 


; else, load ASCII 'Z' 




BSR 


SEND_CHAR 


; send ACCA to serial port 




BRA 


BEGIN 


; start over again 



FIGURE 3.14 Typical assembly language listing. 
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• Immediate addressing places an operand's value literally into the instruction sequence. LDAA 
# ' Z ' has its primary operand immediately available following the opcode. An immediate oper- 
and is indicated with the # prefix in some assembly languages. Eight-bit microprocessors with 
eight-bit instruction words cannot fit an immediate value into the instruction word itself and, 
therefore, require that an extra byte following the opcode be used to specify the immediate value. 
More powerful 32-bit microprocessors can often fit a 16-bit or 24-bit immediate value within the 
instruction word. This saves an additional memory fetch to obtain the operand. 

■ Direct addressing places the address of an operand directly into the instruction sequence. Instead 
of specifying LDAA # ' Z ' , the programmer could specify LDAA $1234. This version of the in- 
struction would tell the microprocessor to read memory location $1234 and load the resulting 
value into the accumulator. The operand is directly available by looking into the memory address 
specified just following the instruction. Direct addressing is useful when there is a need to read a 
fixed memory location. Usage of the direct addressing mode has a slightly different impact on var- 
ious microprocessors. A typical 8-bit microprocessor has a 16-bit address space, meaning that two 
bytes following the opcode are necessary to represent a direct address. The 8-bit microprocessor 
will have to perform two additional 8-bit fetch operations to load the direct address. A typical 32- 
bit microprocessor has a 32-bit address space, meaning that 4 bytes following the opcode are nec- 
essary. If the 32-bit microprocessor has a 32-bit data bus. only one additional 32-bit fetch opera- 
tion is required to load the direct address. 

■ Relative addressing places an operand’s relative address into the instruction sequence. A relative 
address is expressed as a signed offset relative to the current value of the PC. Relative addressing 
is often used by branch instructions, because the target of a branch is usually within a short dis- 
tance of the PC, or current instruction. For example, BNE INC_LOOP results in a branch-if-not- 
equal backward by two instructions. The assembler automatically resolves the addresses and cal- 
culates a relative offset to be placed following the BNE opcode. This relative operation is per- 
formed by adding the offset to the PC. The new PC value is then used to resume the instruction 
fetch and execution process. Relative addressing can utilize both positive and negative deltas that 
are applied to the PC. A microprocessor's instruction format constrains the relative range that can 
be specified in this addressing mode. For example, most 8-bit microprocessors provide only an 8- 
bit signed field for relative branches, indicating a range of +127/-128 bytes. The relative delta 
value is stored into its own byte just after the opcode. Many 32-bit microprocessors allow a 16-bit 
delta field and are able to fit this value into the 32-bit instruction word, enabling the entire instruc- 
tion to be fetched in a single memory read. Limiting the range of a relative operation is generally 
not an excessive constraint because of software's locality property. Locality in this context means 
that the set of instructions involved in performing a specific task are generally relatively close to- 
gether in memory. The locality property covers the great majority of branch instructions. For 
those few branches that have their targets outside of the allowed relative range, it is necessary to 
perform a short relative branch to a long jump instruction that specifies a direct address. This re- 
duces the efficiency of the microprocessor by having to perform two branches when only one is 
ideally desired, but the overall efficiency of saving extra memory accesses for the majority of 
short branches is worth the trade-off. 

• Indirect addressing specifies an operand’s direct address as a value contained in another register. 
The other register becomes a pointer to the desired data. For example, a microprocessor with two 
accumulators can load ACCA with the value that is at the address in ACCB. LDAA (ACCB) 
would tell the microprocessor to put the value of accumulator B onto the address bus, perform a 
read, and put the returned value into accumulator A. Indirect addressing allows writing software 
routines that operate on data at different addresses. If a programmer wants to read or write an arbi- 
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trary entry in a data table, the software can load the address of that entry into a microprocessor 
register and then perform an indirect access using that register as a pointer. Some microprocessors 
place constraints on which registers can be used as references for indirect addressing. In the case 
of a 6800 microprocessor, LDAA (ACCB) is not actually a supported operation but serves as a 
syntactical example for purposes of discussion. 

• Indexed addressing is a close relative (no pun intended) of indirect addressing, because it also re- 
fers to an address contained in another register. However, indexed addressing also specifies an off- 
set, or index, to be added to that register base value to generate the final operand address: base + 
offset = final address. Some microprocessors allow general accumulator registers to be used as 
base-address registers, but others, such as the 6800, provide special index registers for this pur- 
pose. In many 8-bit microprocessors, a full 16-bit address cannot be obtained from an 8-bit accu- 
mulator serving as the base address. Therefore, one or more separate index registers are present 
for the purpose of indexed addressing. In contrast, many 32-bit microprocessors are able to spec- 
ify a full 32-bit address with any general-purpose register and place no limitations on which regis- 
ter serves as the index register. Indexed addressing builds upon the capabilities of indirect 
addressing by enabling multiple address offsets to be referenced from the same base address. 
LDAA (X+$2 0 ) would tell the microprocessor to add $20 to the index register, X, and use the 
resulting address to fetch data to be loaded into ACCA. One simple example of using indexed ad- 
dressing is a subroutine to add a set of four numbers located at an arbitrary location in memory. 
Before calling the subroutine, the main program can set an index register to point to the table of 
numbers. Within the subroutine, four individual addition instructions use the indexed addressing 
mode to add the locations X+0, X+l, X+2, and X+3. When so written, the subroutine is flexible 
enough to be used for any such set of numbers. Because of the similarity of indexed and indirect 
addressing, some microprocessors merge them into a single mode and obtain indirect addressing 
by performing indexed addressing with an index value of zero. 

The six conceptual addressing modes discussed above represent the various logical mechanisms 
that a microprocessor can employ to access data. It is important to realize that each individual micro- 
processor applies these addressing modes differently. Some combine multiple modes into a single 
mode (e.g., indexed and indirect), and some will create multiple submodes out of a single mode. The 
exact variation depends on the specifics of an individual microprocessor’s architecture. 

With the various addressing modes modifying the specific opcode and operands that are presented 
to the microprocessor, the benefits of using assembly language over direct binary values can be ob- 
served. The programmer does not have to worry about calculating branch target addresses or resolv- 
ing different addressing modes. Each mnemonic can map to several unique opcodes, depending on 
the addressing mode used. For example, the LDAA instruction in Fig. 3.14 could easily have used ex- 
tended addressing by specifying a full 16-bit address at which the ASCII transmit- value is located. 
Extended addressing is the 6800’s mechanism for specifying a 16-bit direct address. (The 6800's di- 
rect addressing involves only an eight-bit address.) In either case, the assembler would determine the 
correct opcode to represent LDAA and insert the correct binary values into the memory dump. Addi- 
tionally, because labels are resolved each time the program is assembled, small changes to the pro- 
gram can be made that add or remove instructions and labels, and the assembler will automatically 
adjust the resulting addresses accordingly. 

Programming in assembly language is different from using a high-level language, because one 
must think in smaller steps and have direct knowledge about the microprocessor’s operation and ar- 
chitecture. Assembly language is processor-specific instead of generic, as with a high-level lan- 
guage. Therefore, assembly language programming is usually restricted to special cases such as boot 
code or routines in which absolute efficiency and performance are demanded. A human programmer 
will usually be able to write more efficient assembly language than a high-level language compiler 
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can generate. In large programs, the slight inefficiency of the compiler is well worth the trade-off for 
ease of programming in a high-level language. However, time-critical routines such as I/O drivers or 
ISRs may benefit from manual assembly language coding. 




CHAPTER 4 

Memory 



Memory is as fundamental to computer architecture as any other element. The ability of a system's 
memory to transact the right quantity of data in the right span of time has a substantial impact on 
how that system fulfills its design goals. Digital engineers struggle with innovative ways to improve 
memory density and bandwidth in a way that is tailored to a specific application's performance and 
cost constraints. 

Knowledge of prevailing memory technologies' strengths and weaknesses is a key requirement for 
designing digital systems. When memory architecture is chosen that complements the rest of the sys- 
tem. a successful design moves much closer to fruition. Conversely, inappropriate memory architecture 
can doom a good idea to the engineering doldrums of impracticality brought on by artificial complexity. 

This chapter provides an introduction to various solid-state memory technologies and explains 
how they work from an internal structural perspective as well as an interface timing perspective. A 
memory’s internal structure is important to an engineer, because it explains why that memory might 
be more suited for one application over another. Interface timing is where the rubber meets the road, 
because it defines how other elements in the system can access memory components' contents. The 
wrong interface on a memory chip can make it difficult for external logic such as a microprocessor 
to access that memory and still have time left over to perform the necessary processing on that data. 

Basic memory organization and terminology are introduced first. This is followed by a discussion 
of the prevailing read-only memory technologies: EPROM, flash, and EEPROM. Asynchronous 
SRAM and DRAM technologies, the foundations for practically all random-access memories, are 
presented next. These asynchronous RAMs are no longer on the forefront of memory technology but 
still find use in many systems. Understanding their operation not only enables their application, it 
also contributes to an understanding of the most recent synchronous RAM technologies. (High-per- 
formance synchronous memories are discussed later in the book. ) The chapter concludes with a dis- 
cussion of two types of specialty memories: multiport RAMs and FIFOs. Multiport RAMs and 
FIFOs are found in many applications where memory serves less as a storage element and more as a 
communications channel between distinct logic blocks. 



4. 1 MEMORY CLASSIFICATIONS 



Microprocessors require memory resources in which to store programs and data. Memory can be 
classified into two broad categories: volatile and nonvolatile. Volatile memory loses its contents 
when power is turned off. Nonvolatile memory retains its contents indefinitely, even when there is no 
power present. Nonvolatile memory can be used to hold the boot code for a computer so that the mi- 
croprocessor can have a place to get started. Once the computer begins initializing itself from non- 
volatile memory, volatile memory is used to store dynamic variables, including the stack and other 
programs that may be loaded from a disk drive. Figure 4. 1 shows that a general memory device con- 
sists of a bit-storage array, address-decode logic, input/output logic, and control logic. 
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FIGURE 4.1 General memory device. 



Despite the logical organization of the device, the internal bit array is usually less rectangular and 
more square in its aspect ratio. For example, a 131,072 x 8 memory (128 kB) may be implemented 
as 512 x 256 x 8. This aspect ratio minimizes the complexity of the address-decode logic and also 
has certain manufacturing process benefits. It takes more logic to generate 131,072 enable signals in 
one pass than to generate 512 and then 256 enables in two passes. The first decode is performed up- 
front in the memory array, and the second decode is performed by a multiplexer to pass the desired 
memory location. 

Nonvolatile memory can be separated into two subcategories: devices whose contents are pro- 
grammed at a factory without the expectation of the data changing over time, and devices whose 
contents are loaded during system manufacture with anticipation of in-circuit updates during the life 
of the product. The former devices are, for all practical purposes, write-once devices that cannot be 
erased easily, if at all. The latter devices are designed primarily to be nonvolatile, but special cir- 
cuitry is designed into the devices to enable erasure and rewriting of the memory contents while the 
devices are functioning in a system. Most often, these circuits and their associated algorithms cause 
the erase/write cycle to be more lengthy and complex than simply reading the existing data out of the 
devices. This penalty on write performance reflects both the desire to secure the nonvolatile memory 
from accidental modification as well as the inherent difficulty in modifying a memory that is de- 
signed to retain its contents in the absence of power. 

Volatile memory can also be separated into two subcategories: devices whose contents are non- 
volatile for as long as power is applied (these devices are referred to as static) and devices whose 
contents require periodic refreshing to avoid loss of data even while power is present (these devices 
are referred to as dynamic). On first thought, the category of dynamic devices may seem absurd. 
What possible benefit is there to a memory chip that cannot retain its memory without assistance? 
The benefit is significantly higher density of memory per unit silicon area, and hence lower cost of 
dynamic versus static memory. One downside to dynamic memory is somewhat increased system 
complexity to manage its periodic update requirement. An engineer must weight the benefits and 
complexities of each memory type when designing a system. Some systems benefit from one mem- 
ory type over the other, and some use both types in different proportions according to the needs of 
specific applications. 

Memory chips are among the more complex integrated circuits that are standardized across multi- 
ple manufacturers through cooperation with an industry association called the Joint Electron Device 
Engineering Council (JEDEC). Standardization of memory chip pin assignments and functionality is 
important, because most memory chips are commodities that derive a large portion of their value by 
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being interoperable across different vendors. Newer memory technologies introduced in the 1990s 
resulted in more proprietary memory architectures that did not retain the high degree of compatibil- 
ity present in other mainstream memory components. However, memory devices still largely con- 
form to JEDEC standards, making their use that much easier. 



4.2 EPROM 



Erasable-programmable read-only-memory, EPROM, is a basic type of nonvolatile memory that has 
been around since the late 1960s. During the 1970s and into the 1990s, EPROM accounted for the 
majority of nonvolatile memory chips manufactured. EPROM maintained its dominance for decades 
and still has a healthy market share because of its simplicity and low cost: a typical device is pro- 
grammed once on an assembly line, after which it functions as a ROM for the rest of its life. An 
EPROM can be erased only by exposing its die to ultraviolet light for an extended period of time 
(typically, 30 minutes). Therefore, once an EPROM is assembled into a computer system, its con- 
tents are, for all practical purposes, fixed forever. Older ROM technologies included programmable- 
ROMs, or PROMs, that were fabricated with tiny fuses on the silicon die. These fuses could be 
burned only once, which prevented a manufacturer from testing each fuse before shipment. In con- 
trast, EPROMs are fairly inexpensive to manufacture, and their erasure capability allows them to be 
completely tested by the semiconductor manufacturer before shipment to the customer. Only a full- 
custom mask-programmed chip, a true ROM, is cheaper to manufacture than an EPROM on a bit- 
for-bit basis. However, mask ROMs are rare, because they require a fixed data image that cannot be 
changed without modifying the chip design. Given that software changes are fairly common, mask 
ROMs are relatively uncommon. 

An EPROM’s silicon bit structure consists of a special MOSFET structure whose gate traps a 
charge that is applied to it during programming. Programming is performed with a higher than nor- 
mal voltage, usually 12 V (older generation EPROMs required 21 V), that places a charge on the 
floating gate of a MOSFET as shown in Fig. 4.2. 

When the programming voltage is applied to the control gate, a charge is induced on the floating 
gate, which is electrically isolated from both the silicon substrate as well as the control gate. This 
isolation enables the floating gate to function as a capacitor with almost zero current leakage across 
the dielectric. In other words, once a charge is applied to the floating gate, the charge remains almost 
indefinitely. A charged floating gate causes the silicon that separates the MOSFET's source and 
drain contacts to electrically conduct, creating a connection from logic ground to the bit output. This 
means that a programmed EPROM bit reads back as a 0. An unprogrammed bit reads back as a 1, be- 
cause the lack of charge on the floating gate does not allow an electrical connection between the 
source and drain. 
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FIGURE 4.2 EPROM silicon bit structure. 
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Once programmed, the charge on the floating gate cannot be removed electrically. UV photons 
cause the dielectric to become slightly conductive, allowing the floating gate’s charge to gradually 
drain away to its unprogrammed state. This UV erasure feature is the reason why many EPROMs are 
manufactured in ceramic packages with transparent quartz windows directly above the silicon die. 
These ceramic packages are generally either DIPs or PLCCs and are relatively expensive. In the late 
1980s it became common for EPROMs to be manufactured in cheaper plastic packages without 
transparent windows. These EPROM devices are rendered one-time programmable, or OTP, because 
it is impossible to expose the die to UV light. OTP devices are attractive, because they are the least 
expensive nonmask ROM technology and provide a manufacturer with the flexibility to change soft- 
ware on the assembly line by using a new data image to program EPROMs. 

The industry standard EPROM family is the 27xxx, where the “xxx” indicates the chip’s memory 
capacity in kilobits. The 27256 and 27512 are very common and easily located devices. Older parts 
include the 2708, 2716, 2732, 2764, and 27128. There are also newer, higher-density EPROMs such 
as the 27010, 27020, and 27040 with 1 Mb, 2 Mb, and 4 Mb densities, respectively. 27xxx EPROM 
devices are most commonly eight bits wide (a 27256 is a 32,768 x 8 EPROM). Wider data words, 
such as 16 or 32 bits, are available but less common. 

Older members of the 27xxx family, such as early NMOS 2716 and 2732 devices, required 21-V 
programming voltages, consumed more power, and featured access times of between 200 and 
450 ns. Newer CMOS devices are designated 27Cxxx, require a 12-V programming voltage, con- 
sume less power, and have access times as fast as 45 ns, depending on the manufacturer and device 
density. 

EPROMs are very easy to use because of their classic asynchronous interface. In most applications, 
the EPROM is treated like a ROM, so writes to the device are not an issue. Two programming control 
pins, V PP and PGM*, serve as the high-voltage source and program enable, respectively. These two 
pins can be set to inactive levels and forgotten. What remains are a chip enable, CE*, an output en- 
able, OE*, an address bus, and a data output bus as shown in Fig. 4.3, using a 27C64 (8K x 8) as an 
example. 

When CE* is inactive, or high, the device is in a powered-down mode in which it consumes the 
least current — measured in microamps due to the quiescent nature of CMOS logic. When CE* and 
OE* are active simultaneously, D[7:0] follows A[12:0] subject to the device’s access time, or propa- 
gation delay. This read timing is shown in Fig. 4.4. 

When OE* is inactive, the data bus is held in a high-impedance state. A certain time after OE* 
goes active, t 0E , the data word corresponding to the given address is driven — assuming that A1 has 
been stable for at least t ACC . If not, t ACC will determine how soon D1 is available rather than t OE . 
While OE* is active, the data bus transitions t ACC ns after the address bus. As soon as OE* is re- 
moved, the data bus returns to a high-impedance state after t 0E z- 
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FIGURE 4.3 27C64 block diagram. 
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FIGURE 4.4 EPROM asynchronous read timing. 



Many microprocessors are able to directly interface to an EPROM via this asynchronous bus be- 
cause of its ubiquity. Most eight-bit microprocessors have buses that function solely in this asyn- 
chronous mode. In contrast, some high-performance 32-bit microprocessors may initially boot in a 
low-speed asynchronous mode and then configure themselves for higher performance operation af- 
ter retrieving the necessary boot code and initialization data from the EPROM. 



4.3 FLASH MEMORY 



Flash memory captured the lion's share of the nonvolatile memory market from EPROMs in the 
1990s and holds a dominant position as the industry leader to this day. Flash is an enhanced EPROM 
that can both program and erase electrically without time-consuming exposure to UV light, and it 
has no need for the associated expensive ceramic and quartz packaging. Flash does cost a small 
amount more to manufacture than EPROM, but its more flexible use in terms of electronic erasure 
more than makes up for a small cost differential in the majority of applications. Flash is found in ev- 
erything from cellular phones to automobiles to desktop computers to solid-state disk drives. It has 
enabled a whole class of flexible computing platforms that are able to upgrade their software easily 
and “on the fly” during normal operation. Similar to EPROMs, early flash devices required separate 
programming voltages. Semiconductor vendors quickly developed single-supply flash devices that 
made their use easier. 

A flash bit structure is very similar to that of an EPROM. Two key differences are an extremely 
thin dielectric between the floating gate and the silicon substrate and the ability to apply varying bias 
voltages to the source and control gate. A flash bit is programmed in the same way that an EPROM 
bit is programmed — by applying a high voltage to the control gate. Flash devices contain internal 
voltage generators to supply the higher programming voltage so that multiple external voltages are 
not required. The real difference appears when the bit is erased electrically. A rather complex quan- 
tum-mechanical behavior called Fowler -Nordheim tunneling is exploited by applying a negative 
voltage to the control gate and a positive voltage to the MOSFET's source as shown in Fig. 4.5. 

The combination of the applied bias voltages and the thin dielectric causes the charge on the float- 
ing gate to drain away through the MOSFET's source. Flash devices cannot go through this pro- 
gram/erase cycle indefinitely. Early devices were rated for 100,000 erase cycles. Modem flash chips 
are often specified up to 1,000,000 erase cycles. One million cycles may sound like a lot, but remem- 
ber that microprocessors run at tens or hundreds of millions of cycles per second. When a processor 
is capable of writing millions of memory locations each second, an engineer must be sure that the 
flash memory is used appropriately and not updated too often so as to maximize its operational life. 
Products that utilize flash memory generally contain some a management algorithm to ensure that 
the erasure limit is not reached during the product’s expected lifetime. This algorithm can be as sim- 
ple as performing software updates only several times per year. Alternatively, algorithms can be 
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FIGURE 4.5 Flash bit erasure. 

smart enough to track how many times each portion of a flash device has been erased and dynami- 
cally make decisions about where to place new data accordingly. 

Flash chips are offered in two basic categories, NOR and NAND, named according to the circuits 
that make up each memory bit. NOR flash is a random access architecture that often functions like 
an EPROM when reading data. NOR memory arrays are directly accessed by a microprocessor and 
are therefore well suited for storing boot code and other programs. NAND flash is a sequential ac- 
cess architecture that segments the memory into many pages, typically 256 or 512 bytes. Each page 
is accessed as a discrete unit. As such, NAND flash does not provide the random access interface of 
a NOR flash. In return for added interface complexity and slower response time, NAND flash pro- 
vides greater memory density than NOR flash. NAND’s greater density makes it ideal for bulk data 
storage. If programs are stored in NAND flash, they must usually be loaded into RAM before they 
can be executed, because the NAND page architecture is not well suited to a microprocessor’s read/ 
write patterns. NAND flash is widely used in consumer electronic memory cards such as those used 
in digital cameras. NAND flash devices are also available in discrete form for dense, nonvolatile data 
storage in a digital system. 

NOR flash is discussed here because of its direct microprocessor interface capability. When oper- 
ating in read-only mode, many NOR flash devices function similarly to EPROMs with a simple 
asynchronous interface. More advanced flash devices implement high-performance synchronous 
burst transfer modes that increase their bandwidth for special applications. Most NOR flash chips, 
however, are used for general processor boot functions where high memory bandwidth is not a main 
concern. Therefore, an inexpensive asynchronous interface a la 27xxx is adequate. 

Writing to flash memory is not as simple as presenting new data to the chip and then applying a 
write enable, as is done with a RAM. Like an EPROM, an already programmed bit must first be 
erased before it can be reprogrammed. This erasure process takes longer than a simple read access. 
As Fig. 4.5 shows, the programming and source contacts of each flash bit must be switched to spe- 
cial voltage levels for erasure. Instead of building switches for each individual bit, the complexity of 
the silicon implementation is reduced by grouping many bits together into blocks. Therefore, a flash 
device is not erased one bit or byte at a time, but rather a block at a time. Flash chips are segmented 
into multiple blocks, depending on the particular device and manufacturer. This block architecture is 
beneficial in that the whole device does not have to be erased, allowing sensitive information to be 
preserved. A good system design takes the flash block structure into account when deciding where to 
locate certain pieces of data or sections of software, thereby requiring the erasure of only a limited 
number of blocks when performing an update of system software or configuration. The block era- 
sure process takes a relatively long time when measured in microprocessor clock cycles. Given that 
the erase procedure clears an entire range of memory, special algorithms are built into the chips to 
protect the blocks by requiring a special sequence of flash accesses before the actual erase process is 
initiated. 
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Flash chips are not as standard as EPROMs, because different manufacturers have created their 
own programming algorithms, memory organizations, and pin assignments. Many conventional par- 
allel data-bus devices have part numbers with “28F” or “29F” prefixes. For example, Advanced Mi- 
cro Devices’ flash memory family is the 29Fxxx. Intel’s family is the 28Fxxx. Aside from 
programming differences, the size and organization of blocks within a flash device is a key func- 
tional difference that may make one vendor’s product better than another for a particular application. 
Two main attributes of flash chips are uniformity of block size and hardware protection of blocks. 

Uniform-block devices divide the memory array into equally sized blocks. Boot-block devices di- 
vide the memory array into one or more small boot blocks and then divide the remainder of memory 
into equally sized blocks. Boot-block devices are popular, because the smaller boot blocks can be 
used to hold the rarely touched software that is used to initialize the system's microprocessor when it 
first turns on. Boot code is often a small fraction of the system’s overall software. Due to its critical 
nature, boot code is often kept simple to reduce the likelihood of errors. Therefore, boot code is sel- 
dom updated. In contrast, other flash ROM contents, such as application code and any application 
data, may be updated more frequently. Using a boot-block device, a microprocessor’s boot code can 
be stored away into its own block without wasting space and without requiring that it be disturbed 
during a more general software update. Applications that do not store boot code in flash may not 
want the complexity of dealing with nonuniform boot blocks and may therefore be better suited to 
uniform-block devices. 

Hardware protection of blocks is important when some blocks hold very sensitive information 
whose loss could cause permanent damage to the system. A common example of this is boot code 
stored in a boot block; if the boot code is corrupted, the CPU will fail to initialize properly the next 
time it is reset. A flash device can implement a low-level protection scheme whereby write/erase op- 
erations to certain blocks can be disabled with special voltage levels and data patterns presented to 
the device. 

Examples of real flash devices serve well to explain how this important class of nonvolatile mem- 
ory functions. Advanced Micro Devices (AMD) manufactures two similar flash devices: the 
29LV010B and the 29LV001B. Both devices are 3.3-V, 1-MB, 128k x 8 parts that offer hardware 
sector protection. The 29LV010B is a uniform-sector device, and the 29LV001B is a boot-sector de- 
vice. AMD uses the term sector instead of block. Both chips have the same basic functional bock di- 
agram shown in Fig. 4.6. 



Present on 29LV001 B only 




FIGURE 4.6 AMD 29LV010B/29LV001B block diagram. 
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Modem flash devices require only a single supply voltage and contain on-chip circuitry to create 
the nonstandard programming and erasure voltages required by the memory array. Control logic de- 
termines which block is placed into erase or program mode at any given time as requested by the mi- 
croprocessor with a predefined flash control algorithm. AMD's algorithm consists of six special 
write transactions to the flash: two unlock cycles, a setup command, two more unlock cycles, and the 
specific erase command. This sequence is detailed in Table 4.1. If interrupted, the sequence must be 
restarted to ensure integrity of the command. 



TABLE 4.1 29LV01 0B/29LV001 B Erase Sequence* 



Cycle 


Write Address 


Write Data 


i 


0x555 


OxAA 


2 


0x2AA 


0x55 


3 


0x555 


0x80 


4 


0x555 


OxAA 


5 


0x2AA 


0x55 


6 


Erase address 


Erase command 



^Source: Am29LV001B, Pub#21557, and Am29LV010B, Pub #22140, 
Advanced Micro Devices, 2000. 



For a whole-chip erase, the address/data in cycle 6 is 0x555/0x10. For a single-sector erase, the 
address/data in cycle 6 is the sector address/0x30. Multiple erase commands may be queued together 
to reduce the total time spent by the internal control logic erasing its sectors. While executing com- 
mands, the data bus is converted into a status communication mechanism. The microprocessor is 
able to periodically poll the device by reading from any valid address. While the erase is in progress, 
a value other than OxFF will be returned. As soon as the erase has completed, the microprocessor 
will read back OxFF. 

Writes to previously erased flash memory locations are accomplished with a similar technique. 
For each location to be programmed, a four-cycle program command sequence is performed as 
shown in Table 4.2. Again, the microprocessor polls for command completion by reading from the 
device. This time, however, the address polled must be the write address. When the microprocessor 
reads back the data that it has written, the command is known to have completed. 

TABLE 4.2 29LV01 0B/29LV001 B Programming Sequence 



Cycle 


Write Address 


Write Data 


i 


0x555 


OxAA 


2 


0x2AA 


0x55 


3 


0x555 


OxAO 


4 


Write address 


Write data 



Other ancillary commands are supported, including device reset and identification operations. 
The 29LV001B includes a hardware-reset signal in addition to the soft reset command. Identification 
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enables the microprocessor to verify exactly which flash device it is connected to and which sectors 
have been hardware protected. Identification is useful for a removable flash module that can be built 
with different parts for specific capacities. Protection status is useful so that software running on the 
microprocessor can know if it is possible to program certain areas of the memory. 

Hardware sector protection is accomplished during the time of system manufacture by applying a 
higher than normal voltage to designated pins on the flash device using special equipment. The des- 
ignated pins on the 29LV010B/29LV001B are address bit 9, A9, and the output enable, OE*. These 
pins are driven to 12 V while the address of the sector to be protected is applied to other address 
pins. During normal operation, there is no way for 12 V to be driven onto these signals, preventing 
the protected sectors from being unprotected while in circuit. The exception to this is a feature on the 
29LV001B that AMD calls temporary sector unprotect. Previously protected sectors can be tempo- 
rarily unprotected by driving 12 V onto the RESET* pin with specific circuitry for this purpose. Tak- 
ing advantage of this feature makes it possible to modify the most sensitive areas of the flash by 
locating a hardware unprotect enable signal in a logic circuit separate from the flash chip itself. 

The major difference between the 29LV010B and 29LV001B is their sector organization. The 
29LV010B contains 8 uniform sectors of 16 kB each. The 29LV001B contains 10 sectors of nonuni- 
form size. Two variants of the 29LV010B are manufactured by AMD, top and bottom boot sector ar- 
chitectures, and their sector organization is listed in Table 4.3. 



TABLE 4.3 29LV010B Sector Organization 



Sector Number 


Top Boot Sector 


Bottom Boot Sector 


0 


16 kB 


8 kB 


1 


16 kB 


4 kB 


2 


16 kB 


4 kB 


3 


16 kB 


16 kB 


4 


16 kB 


16 kB 


5 


16 kB 


16 kB 


6 


16 kB 


16 kB 


7 


4 kB 


16 kB 


8 


4 kB 


16 kB 


9 


8 kB 


16 kB 



The reason for these mirrored architectures is that some microprocessors contain reset vectors to- 
ward the top of their address space and some toward the bottom. It is a better fit to locate the boot 
sectors appropriately depending on a system's CPU. As with any complex IC, there are many details 
relating to the operation of these flash ICs. Refer to AMD's data sheets for more information. 



4.4 EEPROM 



Electrically erasable programmable ROM, or EEPROM, is flash’s predecessor. In fact, some people 
still refer to flash as “flash EEPROM,” because the underlying structures are very similar. EEPROM, 
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sometimes written as E 2 PROM, is more expensive to manufacture per bit than EPROM or flash, be- 
cause individual bytes may be erased randomly without affecting neighboring locations. Because of 
the complexity and associated cost of making each byte individually erasable, EEPROM is not com- 
monly manufactured in large densities. Instead, it has served as a niche technology for applications 
that require small quantities of flexible reprogrammable ROM. Common uses for EEPROM are as 
program memory in small microprocessors with embedded memory and as small nonvolatile mem- 
ory arrays to hold system configuration information. Serial EEPROM devices can be found in eight- 
pin DIP or SOIC packages and provide up to several kilobytes of memory. Their serial interface, 
small size, and low power consumption make them very practical as a means to hold serial numbers, 
manufacturing information, and configuration data. 

Parallel EEPROM devices are still available from manufacturers as the 28xx family. They are pin 
and function compatible (for reads) with the 27xxx EPROM family that they followed. Some appli- 
cations requiring reprogrammable nonvolatile memory may be more suited to EEPROM than flash, 
but flash is a compelling choice, because it is the more mainstream technology with the resultant 
benefit of further cost reduction. 

Serial EEPROMs, however, are quite popular due to their very small size and low power con- 
sumption. They can be squeezed into almost any corner of a system to provide small quantities of 
nonvolatile storage. Microchip Technology is a major manufacturer of serial EEPROMs and offers 
the 24xx family. Densities range from 16 bytes to several kilobytes. Given that serial interfaces use 
very few pins, these EEPROMs are manufactured in packages ranging from eight-pin DIPs to five- 
pin SOT-23s that are smaller than a fingernail. Devices of this sort are designed to minimize system 
impact rather than for speed. Their power consumption is measured in nanoamps and microamps in- 
stead of milliamps, as is the case with standard flash, parallel EEPROM, and EPROM devices. 

Microchip’s 24LC00 is a 16-byte serial EEPROM with a two- wire serial bus. It requires only four 
pins: two for power and two for data communication. Like most modem flash devices, the 24LC00 
is rated for one million write cycles. When not being accessed, the 24LC00 consumes about 250 nA! 
When active, it consumes only 500 pA. For added flexibility, the 24LC00 can operate over a variety 
of supply voltages from 2.5 to 6.0 V. Speed is not a concern here: writes take up to 4 ms to complete, 
which is not a problem when writing only a few bytes on rare occasions. 



4.5 ASYNCHRONOUS SRAM 
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FIGURE 4.7 SRAM bit feedback latch, 
in Fig. 4.7. 



Static RAM, or SRAM, is the most basic and easy to use 
type of volatile memory and is found in almost every com- 
puter in one form or another. An SRAM device is concep- 
tually easy to understand, consisting of an array of latches 
along with control and decode logic to resolve the address 
that is being read or written at any given time. Each latch is 
a feedback circuit that traps and maintains a particular 
logic state. A typical SRAM bit implementation is shown 



An SRAM latch is created by connecting two inverters in a loop. One side of the loop remains sta- 
ble at the desired logic state, and the other remains stable at the opposite state. Inverters are used 
rather than noninverting buffers, because an inverter is the simplest logic element to construct. The 
two pass transistors on either side of the latch enable both writing and reading. When writing, the 
transistors turn on and force each half of the loop to whatever state is driven on the vertical bit lines. 
When reading, the transistors also turn on, but the bit lines are sensed rather than driven. Typical 
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SRAM implementations require six transistors per bit of memory: two transistors for each inverter 
and the two pass transistors. Some implementations use only a single transistor per inverter, requir- 
ing only four transistors per bit. 

Discrete asynchronous SRAM devices have been around for decades. In the 1980s. the 6264 and 
62256 were manufactured by multiple vendors and used in applications that required simple RAM 
architectures with relatively quick access times and low power consumption. The 62xxx family is 
numbered according to its density in kilobits. Hence, the 6264 provides 65,536 bits of RAM ar- 
ranged as 8k x 8. The 62256 provides 262,144 bits of RAM arranged as 32k x 8. Being manufac- 
tured in CMOS technology and not using a clock, these devices consume very little power and draw 
only microamps when not being accessed. 

The 62xxx family pin assignment is virtually identical to that of the 27xxx EPROM family, en- 
abling system designs where either EPROM or SRAM can be substituted into the same location with 
only a couple of jumpers to set for unique signals such as the program-enable on an EPROM or 
write-enable on an SRAM. Like an EPROM or basic flash device, asynchronous SRAMs have a sim- 
ple interface consisting of address, data, chip select, output enable, and write enable. This interface 
is shown in Fig. 4.8. 

Writes are performed whenever the WE* signal is held low. Therefore, one must ensure that the 
desired address and data are stable before asserting WE* and that WE* is removed while address 
and data remain stable. Otherwise, the write may corrupt an undesired memory location. Unlike an 
EPROM, but like flash, the data bus is bidirectional during normal operation. The first two transac- 
tions shown are writes as evidenced by the separate assertions of WE* for the duration of address 
and data stability. As soon as the writes are completed, the microprocessor should release the data 
bus to the high-impedance state. When OE* is asserted, the SRAM begins driving the data bus and 
the output reflects the data contents at the locations specified on the address bus. 

Asynchronous SRAMs are available with access times of less than 100 ns for inexpensive parts 
and down to 10 ns for more expensive devices. Access time measures both the maximum delay be- 
tween a stable read address and its corresponding data and the minimum duration of a write cycle. 
Their ease of use makes them suitable for small systems where megabytes of memory are not re- 
quired and where reduced complexity and power consumption are key requirements. Volatile mem- 
ory doesn’t get any simpler than asynchronous SRAM. 

Prior to the widespread availability of flash, many computer designs in the 1980s utilized asyn- 
chronous SRAM with a battery backup as a means of implementing nonvolatile memory for storing 
configuration information. Because an idle SRAM draws only microamps of current, a small battery 
can maintain an SRAM’s contents for several years while the main power is turned off. Using 
SRAM in this manner has two distinct advantages over other technologies: writes are quick and 
easy, because there are no complex EEPROM or flash programming algorithms, and there is no limit 
to the number of write cycles performed over the life of the product. The downsides to this approach 
are a lack of security for protecting valuable configuration information and the need for a battery to 
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FIGURE 4.8 62xxx SRAM interface. 
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maintain the memory contents. Requiring a battery increases the complexity of the system and also 
begs the question of what happens when the battery wears out. In the 1980s, it was common for a 
PC’s BIOS configuration to be stored in battery-backed CMOS SRAM. This is how terms like “the 
CMOS’’ and “CMOS setup’’ entered the lexicon of PC administration. 

SRAM is implemented not only as discrete memory chips but is commonly found integrated 
within other types of chips, including microprocessors. Smaller microprocessors or microcontrollers 
(microprocessors integrated with memory and peripherals on a single chip) often contain a quantity 
of on-board SRAM. More complex microprocessors may contain on-chip data caches implemented 
with SRAM. 



4.6 ASYNCHRONOUS DRAM 



SRAM may be the easiest volatile memory to use, but it is not the 
least expensive in significant densities. Each bit of memory re- 
quires between four and six transistors. When millions or billions 
of bits are required, the complexity of all those transistors be- 
comes substantial. Dynamic RAM, or DRAM, takes advantage of 
FIGURE 4.9 DRAM bit structure, a very simple yet fragile storage component: the capacitor. A ca- 
pacitor holds an electrical charge for a limited amount of time as 
the charge gradually drains away. As seen from EPROM and flash devices, capacitors can be made 
to hold charge almost indefinitely, but the penalty for doing so is significant complexity in modifying 
the storage element. Volatile memory must be both quick to access and not be subject to write-cycle 
limitations — both of which are restrictions of nonvolatile memory technologies. When a capacitor is 
designed to have its charge quickly and easily manipulated, the downside of rapid discharge 
emerges. A very efficient volatile storage element can be created with a capacitor and a single tran- 
sistor as shown in Fig. 4.9, but that capacitor loses its contents soon after being charged. This is 
where the term dynamic comes from in DRAM — the memory cell is indeed dynamic under steady- 
state conditions. The solution to this problem of solid-state amnesia is to periodically refresh, or up- 
date, each DRAM bit before it completely loses its charge. 

As with SRAM, the pass transistor enables both reading and writing the state of the storage ele- 
ment. However, a single capacitor takes the place of a multitransistor latch. This significant reduc- 
tion in bit complexity enables much higher densities and lower per-bit costs when memory is 
implemented in DRAM rather than SRAM. This is why main memory in most computers is imple- 
mented using DRAM. The trade-off for cheaper DRAM is a degree of increased complexity in the 
memory control logic. The number one requirement when using DRAM is periodic refresh to main- 
tain the contents of the memory. 

DRAM is implemented as an array of bits with rows and columns as shown in Fig. 4.10. Unlike 
SRAM, EPROM, and flash, DRAM functionality from an external perspective is closely tied to its 
row and column organization. 

SRAM is accessed by presenting the complete address simultaneously. A DRAM address is pre- 
sented in two parts: a row and a column address. The row and column addresses are multiplexed 
onto the same set of address pins to reduce package size and cost. First the row address is loaded, or 
strobed, into the row address latch via row address strobe , or RAS*, followed by the column address 
with column address strobe , or CAS*. Read data propagates to the output after a specified access 
time. Write data is presented at the same time as the column address, because it is the column strobe 
that actually triggers the transaction, whether read or write. It is during the column address phase 
that WE* and OE* take effect. 
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FIGURE 4.10 DRAM architecture. 

Sense amplifiers on the chip are necessary to detect the minute charges that are held in the 
DRAM’s capacitors. These amplifiers are also used to assist in refresh operations. It is the memory 
controller’s responsibility to maintain a refresh timer and initiate refresh operations with sufficient 
frequency to guarantee data integrity. Rather than refreshing each bit separately, an entire row is re- 
freshed at the same time. An internal refresh counter increments after each refresh so that all rows, 
and therefore all bits, will be cycled through in order. When a refresh begins, the refresh counter 
enables a particular memory row. The contents of the row are detected by the sense amplifiers and 
then driven back into the bit array to recharge all the capacitors in that row. Modern DRAMs typi- 
cally require a complete refresh every 64 ms. A 64-Mb DRAM organized as 8,388,608 words x 8 
bits (8 MB) with an internal array size of 4,096 x 2,048 bytes would require 4,096 refresh cycles 
every 64 ms. Refresh cycles need not be evenly spaced in time but are often spread uniformly for 
simplicity. 

The complexity of performing refresh is well worth the trouble because of the substantial cost and 
density improvements over SRAM. One downside of DRAM that can only be partially compensated 
for is its slower access time. A combination of its multiplexed row and column addressing scheme 
plus its large memory arrays with complex sense and decode logic make DRAM significantly slower 
than SRAM. Mainstream computing systems deal with this speed problem by implementing SRAM- 
based cache mechanisms whereby small chunks of memory are prefetched into fast SRAM so that 
the microprocessor does not have to wait as long for new data that it requests. 

Asynchronous DRAM was the prevailing DRAM technology until the late 1990s, when synchro- 
nous DRAM, or SDRAM, emerged as the dominant solution to main memory. At its heart, SDRAM 
works very much like DRAM but with a synchronous bus interface that enables faster memory trans- 
actions. It is useful to explore how older asynchronous DRAM works so as to understand SDRAM. 
SDRAM will be covered in detail later in the book. 

RAS* and CAS* are the two main DRAM control signals. They not only tell the DRAM chip 
which address is currently being asserted, they also initiate refresh cycles and accelerate sequential 
transactions to increase performance. A basic DRAM read works as shown in Fig. 4. 1 1 . CE* and 
OE* are both assumed to be held active (low) throughout the transaction. 

A transaction begins by asserting RAS* to load the row address. The strobes are falling-edge sen- 
sitive, meaning that the address is loaded on the falling edge of the strobe, sometime after which the 
address may change. Asynchronous DRAMs are known for their myriad detailed timing require- 
ments. Every signal’s timing relative to itself and other signals is specified in great detail, and these 
parameters must be obeyed for reliable operation. RAS* is kept low for the duration of the transac- 
tion. Assertion of CAS* loads the column address into the DRAM as well as the read or write status 
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FIGURE 4.11 Basic DRAM read (CE* = 0, OE* = 0). 



of the transaction. Some time later, the read data is made available on the data bus. After waiting for 
a sufficient time for the DRAM to return the read data, the memory controller removes RAS* and 
CAS* to terminate the transaction. 

Basic writes are similar to single reads as shown in Fig. 4.12. Again, CE* is assumed to be held 
active, and, being a write, OE* is assumed to be held inactive throughout the transaction. 

Like a read, the write transaction begins by loading the row address. From this it is apparent that 
there is no particular link between loading a row address and performing a read or a write. The iden- 
tity of the transaction is linked to the falling edge of CAS*, when WE* is asserted at about the same 
time that the column address and write data are asserted. DRAM chips require a certain setup and 
hold time for these signals around the falling edge of CAS*. Once the timing requirements are met, 
address can be deasserted prior to the rising edge of CAS*. 

A read/write hybrid transaction, called a read-modify-write , is also supported to improve the effi- 
ciency of the memory subsystem. In a read-modify-write, the microprocessor fetches a word from 
memory, performs a quick modification to it, and then writes it back as part of the same original 
transaction. This is an atomic operation, because it functions as an indivisible unit and cannot be in- 
terrupted. Figure 4.13 shows the timing for the read-modify-write. Note that CAS* is held for a 
longer period of time, during which the microprocessor may process the read-data before asserting 
WE* along with the new data to be written. 

Original DRAMs were fairly slow. This was partly because older silicon processes limited the de- 
code time of millions of internal addresses. It was also a result of the fact that accessing a single lo- 
cation required a time-consuming sequence of RAS* followed by CAS*. In comparison, an SRAM 
is quick and easy: assert the address in one step and grab the data. DRAM went through an architec- 
tural evolution that replaced the original devices with fast-page mode (FPM) devices that allow more 
efficient accesses to sequential memory locations. FPM DRAMs provide a substantial increase in 
usable memory bandwidth for the most common DRAM application: CPU memory. These devices 
take advantage of the tendency of a microprocessor’s memory transactions to be sequential in nature. 
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FIGURE 4.12 Basic DRAM write (CE* = 0, OE* = 1 ). 
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FIGURE 4.13 Read-modify-write transaction. 

Software does occasionally branch back and forth in its memory space. Yet, on the whole, software 
moves through portions of memory in a linear fashion. FPM devices enable a DRAM controller to 
load a row-address in the normal manner using RAS* and then perform multiple CAS* transactions 
using the same row-address. Therefore, DRAMs end their transaction cycles with the rising edge of 
RAS*, because they cannot be sure if more reads or writes are coming until RAS* rises, indicating 
that the current row-address can be released. 

FPM technology, in turn, gave way to extended-data out (EDO) devices that extend the time read 
data is held valid. Unlike its predecessors, an EDO DRAM does not disable the read data when 
CAS* rises. Instead, it waits until either the transaction is complete (RAS* rises), OE* is deasserted, 
or until CAS* begins a new page-mode access. While FPM and EDO DRAMs are distinct types of 
devices, EDO combines the page-mode features of FPM and consequently became more attractive to 
use. The following timing discussion uses EDO functionality as the example. 

Page-mode transactions hold RAS* active and cycle CAS* multiple times to perform reads and 
writes as shown in Figs. 4.14 and 4.15. Each successive CAS* falling edge loads a new column ad- 
dress and causes either a read or write to be performed. In the read case. EDO's benefit can be properly 
observed. Rather than read data being removed when CAS* rises, it remains asserted until just after the 
next falling edge of CAS* or the rising edge of RAS* that terminates the page-mode transaction. 
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FIGURE 4.14 Page-mode reads. 
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FIGURE 4.15 Page-mode writes. 
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There are some practical limits to the duration of a page-mode transaction. First, there is an abso- 
lute maximum time during which RAS* can remain asserted. The durations of RAS* and CAS* are 
closely specified to guarantee proper operation of the DRAM. Operating the DRAM with a mini- 
mum CAS* cycle time and a maximum RAS* assertion time will yield a practical limitation on the 
data burst that can be read or written without reloading a new row address. In reality, a common 
asynchronous DRAM can support over 1,000 back-to-back accesses for a given row-address. 
DRAM provides its best performance when operated in this manner. The longer the burst, the less 
overhead is experienced for each byte transferred, because the row-address setup time is amortized 
across each word in the burst. Cache subsystems on computers help manage the bursty nature of 
DRAM by swallowing a set of consecutive memory locations into a small SRAM cache where the 
microprocessor will then have easy access to them later without having to wait for a lengthy DRAM 
transaction to execute. 

The second practical limitation on page-mode transactions, and all DRAM transactions in gen- 
eral, is refresh overhead. The DRAM controller must be smart enough to execute periodic refresh 
operations at the required frequency. Even if the microprocessor is requesting more data, refresh 
must take priority to maintain memory integrity. At any given instant in time, a scheduled refresh op- 
eration may be delayed slightly to accommodate a CPU request, but not to the point where the con- 
troller falls behind and fails to execute the required number of refresh operations. There are a variety 
ways to initiate a refresh operation, but most involve a so-called CAS-before-RAS signaling where 
the normal sequence of the address strobes is reversed to signal a refresh. Asserting CAS* before 
RAS* signals the DRAM’s internal control logic to perform a row-refresh at the specific row indi- 
cated by its internal counter. Following this operation, the refresh counter is incremented in prepara- 
tion for the next refresh event. 

DRAM has numerous advantages over SRAM, but at the price of increased controller complexity 
and decreased performance in certain applications. DRAMs use multiplexed address buses, which 
saves pins and enables smaller, less expensive packaging and circuit board wiring. Most DRAMs are 
manufactured with data bus widths smaller than what is actually used in a computer to save pins. For 
example, when most computers used 8- or 16-bit data buses, most DRAMs were 1 bit wide. When 
microprocessors grew to 32 and 64 bit data buses, mainstream DRAMs grew to 4- and then 8-bit 
widths. This is in contrast to SRAMs, which have generally been offered with wide buses, starting 
out at 4 bits and then increasing to 72 bits in more modern devices. This width disparity is why most 
DRAM implementations in computers involve groups of four, eight, or more DRAMs on a single 
module. In the 1980s, eight 64k x 1 DRAMs created a 64 kB memory array. Today, eight 32M x 8 
DRAMs create a 256 MB memory array that is 64 bits wide to suit the high-bandwidth 32- or 64-bit 
microprocessor in your desktop PC. 

A key architectural attribute of DRAM is its inherent preference for sequential transactions and, ac- 
cordingly, its weakness in handling random single transactions. Because of their dense silicon struc- 
tures and multiplexed address architecture, DRAMs have evolved to provide low-cost bulk memory 
best suited to burst transactions. The overhead of starting a burst transaction can be negligible when 
spread across many individual memory words in a burst. However, applications that are not well 
suited to long bursts may not do very well with DRAM because of the constant startup penalty in- 
volved in fetching 1 word versus 1,000 words. Such applications may work better with SRAM. Plan- 
ning memory architecture involves making these trade-offs between density/cost and performance. 



4.7 MULTIPORT MEMORY 



Most memory devices, whether volatile or nonvolatile, contain a single interface through which their 
contents are accessed. In the context of a basic computer system with a single microprocessor, this 
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single-port architecture is well suited. There are some architectures in which multiple microproces- 
sors or logic blocks require access to the same shared pool of memory. A shared pool of memory can 
be constructed in a couple of ways. First, conventional DRAM or SRAM can be combined with ex- 
ternal logic that takes requests from separate entities (e.g., microprocessors) and arbitrates access to 
one requestor at a time. When the shared memory pool is large, and when simultaneous access by 
multiple requesters is not required, arbitration can be an efficient mechanism. However, the com- 
plexity of arbitration logic may be excessive for small shared-memory pools, and arbitration does 
not enable simultaneous access. A means of sharing memory without arbitration logic and with si- 
multaneous access capability is to construct a true multiport memory element. 

A multiport memory provides simultaneous access to multiple external entities. Each port may 
be read/write capable, read-only, or write-only depending on the implementation and application. 
Multiport memories are generally kept relatively small, because their complexity, and hence their 
cost, increases significantly as additional ports are added, each with its own decode and control 
logic. Most multiport memories are dual-port elements as shown in Fig. 4.16. 

A true dual-port memory places no restrictions on either port’s transactions at any given time. It is 
the responsibility of the engineer to ensure that one requester does not conflict with the other. Con- 
flicts arise when one requester writes a memory location while the other is either reading or writing 
that same location. If a simultaneous read/write occurs, what data does the reader see? Is it the data 
before or after the write? Likewise, if two writes proceed at the same time, which one wins? While 
these riddles could be worked out for specific applications with custom logic, it is safer not to worry 
about such comer cases. Instead, the system design should avoid such conflicts unless there is a 
strong reason to the contrary. 

One common application of a dual-port memory is sharing information between two micropro- 
cessors as shown in Fig. 4.17. A dual-port memory sits between the microprocessors and can be par- 
titioned into a separate message bin, or memory area, for each side. Bin A contains messages written 
by CPU A and read by CPU B. Bin B contains messages written by CPU B and read by CPU A. 

Notification of a waiting message is accomplished via a CPU interrupt, thereby releasing the 
CPUs from having to constantly poll the memory as they wait for messages to arrive. The entire pro- 
cess might work as follows: 

1. CPU A writes a message for CPU B into Bin A. 

2. CPU A asserts an interrupt to CPU B indicating the a message is waiting in Bin A. 

3. CPU B reads the message in Bin A. 

4. CPU B acknowledges the interrupt from CPU A. 

5. CPU A releases the interrupt to CPU B. 

An implementation like this prevents dual-port memory conflicts because one CPU will not read a 
message before it has been fully written by the other CPU and neither CPU writes to both bins. 




FIGURE 4.16 Dual-port memory. 
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FIGURE 4.17 Dual microprocessor message passing architecture. 



4.8 THE FIFO 



The memory devices discussed thus far are essentially linear arrays of bits surrounded by a minimal 
quantity of interface logic to move bits between the port(s) and the array. First-in-first-out (FIFO) 
memories are special-purpose devices that implement a basic queue structure that has broad applica- 
tion in computer and communications architecture. Unlike other memory devices, a typical FIFO 
has two unidirectional ports without address inputs: one for writing and another for reading. As the 
name implies, the first data written is the first read, and the last data written is the last read. A FIFO 
is not a random access memory but a sequential access memory. Therefore, unlike a conventional 
memory, once a data element has been read once, it cannot be read again, because the next read will 
return the next data element written to the FIFO. By their nature, FIFOs are subject to overflow and 
underflow conditions. Their finite size, often referred to as depth, means that they can fill up if reads 
do not occur to empty data that has already been written. An overflow occurs when an attempt is 
made to write new data to a full FIFO. Similarly, an empty FIFO has no data to provide on a read re- 
quest, which results in an underflow. 

A FIFO is created by surrounding a dual-port memory array — generally SRAM, but DRAM 
could be made to work as well for certain applications — with a write pointer, a read pointer, and con- 
trol logic as shown in Fig. 4. 1 8. 




FIGURE 4.18 Basic FIFO architecture. 
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A FIFO is not addressed in a linear fashion; rather, it is made to form a continuous ring of mem- 
ory that is addressed by the two internal pointers. The fullness of the FIFO is determined not by the 
absolute values of the pointers but by their relative values. An empty FIFO begins with its read and 
write pointers set to the same value. As entries are written, the write pointer increments. As entries 
are read, the read pointer increments as well. If the read pointer ever catches up to the write pointer 
such that the two match, the FIFO is empty again. If the read pointer fails to advance, the write 
pointer will eventually wrap around the end of the memory array and become equal to the read 
pointer. At this point, the FIFO is full and cannot accept any more data until reading resumes. Full 
and empty flags are generated by the FIFO to provide status to the writing and reading logic. Some 
FIFOs contain more detailed fullness status, such as signals that represent programmable fullness 
thresholds. 

The interfaces of a FIFO can be asynchronous (no clock) or synchronous (with a clock). If syn- 
chronous, the two ports can be designed to operate with a common clock or different clocks. Al- 
though older asynchronous FIFOs are still manufactured, synchronous FIFOs are now more 
common. Synchronous FIFOs have the advantage of improved interface timing, because flops 
placed at a device’s inputs and outputs reduce timing requirements to the familiar setup, hold, and 
clock-to-out specifications. Without such a registered interface, timing specifications become a func- 
tion of the device’s internal logic paths. 

One common role that a FIFO fills is in clock domain crossing. In such an application, there is a 
need to communicate a series of data values from a block of logic operating on one clock to another 
block operating on a different clock. Exchanging data between clock domains requires special atten- 
tion, because there is normally no way to perform a conventional timing analysis across two differ- 
ent clocks to guarantee adequate setup and hold times at the destination flops. Either an 
asynchronous FIFO or a dual-clock synchronous FIFO can be used to solve this problem, as shown 
in Fig. 4.19. 

The dual-port memory at the heart of the FIFO is an asynchronous element that can be accessed 
by the logic operating in either clock domain. A dual-clock synchronous FIFO is designed to handle 
arbitrary differences in the clocks between the two halves of the device. When one or more bytes are 
written on clock A, the write-pointer information is carried safely across to the clock B domain 
within the FIFO via inter-clock domain synchronization logic. This enables the read-control inter- 
face to determine that there is data waiting to be read. Logic on clock B can read this data long after 
it has been safely written into the memory array and allowed to settle to a stable state. 

Another common application for a FIFO is rate matching where a particular data source is bursty 
and the data consumer accepts data at a more regular rate. One example is a situation where a se- 
quence of data is stored in DRAM and needs to be read out and sent over a communications inter- 
face one byte at a time. The DRAM is shared with a CPU that competes with the communications 
interface for memory bandwidth. It is known that DRAMs are most efficient when operated in a 
page-mode burst. Therefore, rather than perform a complete read-transaction each time a single byte 




FIGURE 4.19 Clock domain crossing with synchronous FIFO. 
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is needed for the communications interface, a burst of data can be read and stored in a FIFO. Each 
time the interface is ready for a new byte, it reads it from the FIFO. In this case, only a single-clock 
FIFO is required, because these devices operate on a common clock domain. To keep this process 
running smoothly, control logic is needed to watch the state of the FIFO and perform a new burst 
read from DRAM when the FIFO begins to run low on data. This scheme is illustrated in Fig. 4.20. 

For data-rate matching to work properly, the average bandwidth over time of the input and output 
ports of the FIFO must be equal, because FIFO capacity is finite. If data is continuously written 
faster than it can be read, the FIFO will eventually overflow and lose data. Conversely, if data is con- 
tinuously read faster than it can be written, the FIFO will underflow and cause invalid bytes to be in- 
serted into the outgoing data stream. The depth of a FIFO indicates how large a read/write rate 
disparity can be tolerated without data loss. This disparity is expressed as the product of rate mis- 
match and time. A small mismatch can be tolerated for a longer time, and a greater rate disparity can 
be tolerated for a shorter time. 

In the rate-matching example, a large rate disparity of brief duration is balanced by a small rate 
disparity of longer duration. When the DRAM is read, a burst of data is suddenly written into the 
FIFO, creating a temporarily large rate disparity. Over time, the communications interface reads one 
byte at a time while no writes are taking place, thereby compensating with a small disparity over 
time. 

DRAM reads to refill the FIFO must be carefully timed to simultaneously prevent overflow and 
underflow conditions. A threshold of FIFO fullness needs to be established below which a DRAM 
read is triggered. This threshold must guarantee that there is sufficient space available in the FIFO to 
accept a full DRAM burst, avoiding an overflow. It must also guarantee that under the worst-case re- 
sponse time of the DRAM, enough data exists in the FIFO to satisfy the communications interface, 
avoiding an underflow. In most systems, the time between issuing a DRAM read request and actu- 
ally getting the data is variable. This variability is due to contention with other requesters (e.g., the 
CPU) and waiting for overhead operations (e.g., refresh) to complete. 
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FIGURE 4.20 Synchronous FIFO application: data rate matching. 








CHAPTER 5 

Serial Communications 



Serial communication interfaces are commonly used to exchange data with other computers. Serial 
interfaces are ubiquitous, because they are economical to implement over long distances as a result 
of their requirement of relatively few wires. Many types of serial interfaces have been developed, 
with speeds ranging to billions of bits per second. Regardless of the bit rate, serial communication 
interfaces share many common traits. This chapter introduces the fundamentals of serial communi- 
cation in the context of popular data links such as RS-232 and RS-485 in which bandwidths and 
components lend themselves to basic circuit fabrication techniques. 

The chapter first deals with the basic parallel-to-serial-to-parallel conversion process that is at the 
heart of all serial communication. Wide buses must be serialized at the transmitter and reconstructed 
at the receiver. Techniques for accomplishing this vary with the specific type of data link, but basic 
concepts of framing and error detection are universal. 

Two widely deployed point-to-point serial communication standards. RS-232 and RS-422, are 
presented, along with the standard ASCII character set, to see how theory meets practice. Standards 
are important to communications in general because of the need to connect disparate equipment. 
ASCII is one of the most fundamental data representation formats with global recognition. RS-232 
has traditionally been found in many digital systems, because it is a reliable standard. Understanding 
RS-232, its relative RS-422, and ASCII enables an engineer to design a communication interface 
that can work with an almost infinite range of complementary equipment ranging from computers to 
modems to off-the-shelf peripherals. 

Systems may require more advanced communication schemes to enable data exchange between 
many nodes. Networks enable such communication and can range in complexity according to an ap- 
plication’s requirements. Networking adds a new set of fundamental concepts on top of basic serial 
communication. Topics including network topologies and packet formats are presented to explain 
how networks function at a basic hardware and software level. Once networking fundamentals have 
been discussed, the RS-485 standard is introduced to show how a simple and fully functional net- 
work can be constructed. A complete network design example using RS-485 is offered with explana- 
tions of why various design points are included and how they contribute to the network’s overall 
operation. 

The chapter closes with a presentation of small-scale networking employed within a digital sys- 
tem to economically connect peripherals to a microprocessor. Interchip networks are of such narrow 
scope that they are usually not referred to as networks, but they can possess many fundamental prop- 
erties of a larger network. Peripherals with low microprocessor bandwidth requirements can be con- 
nected using a simple serial interface consisting of just a few wires, as compared to the full 
complexity of a parallel bus. 
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5. 1 SERIAL VS. PARALLEL COMMUNICATION 



Most logical operations and data processing occur in parallel on multiple bits simultaneously. Mi- 
croprocessors, for example, have wide data buses to increase throughput. With wide buses comes a 
requirement for more wires to connect the logical elements in a system. The interconnection penalty 
increases as distances increase. Within a chip, the penalty is small, and wide buses are common. Im- 
plementing wide buses on a circuit board is also common because of the relatively short distances 
involved. 

The economics and technical context of interconnect changes as soon as the distances grow from 
centimeters to meters to kilometers. Communication is primarily concerned with transporting data 
from one location to another rather than processing that information as it is carried on a wire. With 
distance comes the expensive problem of stringing a continuous wire between two locations. 
Whether the wire is threaded through a conduit between floors in an office, buried under the street 
between buildings, or virtually constructed via radio transmission to a satellite, the cost and com- 
plexity of connecting multiple wires is many orders of magnitude greater than on a circuit board. Se- 
rial communication is well suited to long distances, because fewer wires are used as compared to a 
parallel bus. A serial data link implies a single-wire medium, but there can be multiwire serial links 
as well. 

Figure 5.1 illustrates several logical components in a serial data link. At either end are the sources 
and consumers of the data that operate using a parallel bus. A transceiver converts between a parallel 
bus and a serial stream and handles any link-level timing necessary to properly send and receive 
data. A transducer, or modulator in wireless links, converts between the medium’s electromagnetic 
signaling characteristics and the transceiver’s logic-level signals. Finally, a conductive path joins the 
two transducers. This path can be copper wire, glass fiber optic cable, or free space. These logical 
components may be integrated in arbitrary physical configurations in different implementations, so 
not all serial links will consist of three specific discrete pieces. Simple links may have fewer pieces, 
and complex links may have more. 

The total cost of a data link is the sum of the cost of the transceiver/transducer subsystems at each 
end and the cost of the physical medium itself. A serial port on a desktop computer is inexpensive 
because of its relatively simple electronic circuits and because the medium over which it communi- 
cates, a short copper wire, is fairly cheap. In contrast, a satellite link is very expensive as a result of 
the greater complexity of the ground-based transmission equipment, the high cost of the satellite it- 
self, and the licensing costs of using the public airwaves. 

If only one bit is transferred per clock cycle in a serial link, it follows that either the serial bit 
clock has to be substantially faster than the parallel bus, or the link’s bandwidth will be significantly 




FIGURE 5.1 Components of a serial data link. 




Serial Communications 99 



below that of the parallel bus. Bandwidth in a communication context refers to the capacity of the 
communications channel, often expressed either in bits-per-second (bps) or bytes-per-second (Bps). 
Serial links are available in a broad spectrum of bandwidths, from thousands of bits per second 
(kbps) to billions of bits per second (Gbps) and are stretching toward trillions of bits per second 
(Tbps)! 

Most implementations in the kbps range involve applications where relatively small quantities of 
data are exchanged, so the cost of deploying an advanced data link is not justified. These serial links 
are able to run at low frequencies (several hundred kilohertz and below) and therefore do not require 
complex circuitry. Of course, some low-bandwidth data links can be very expensive if the medium 
over which they operate presents extreme technical difficulties, such as communicating across inter- 
planetary distances. Implementations in the Gbps range serve applications such as high-end com- 
puter networks where huge volumes of data are carried. Such links are run at gigahertz frequencies 
and are relatively costly due to this high level of performance. Gigahertz serial transfer rates do not 
translate into similar logic clock frequencies. When a transceiver converts a serial data stream into a 
parallel bus, it contains the very high frequency complexity within itself. A 1-Gbps link requires 
only a 31.25 MHz clock when using a 32-bit data path. 



5.2 THE UART 



The universal asynchronous receiver/transmitter (UART) is a basic transceiver element that serial- 
izes a parallel bus when transmitting and deserializes the incoming stream when receiving. In addi- 
tion to bus-width conversion, the UART also handles overhead and synchronization functions 
required to transport data. Data bits cannot simply be serialized onto a wire without some additional 
information to delineate the start and end of each unit of data. This delineation is called framing. The 
receiver must be able to recognize the start of a byte so that it can synchronize its shift registers and 
receive logic to properly capture the data. Basic framing is accomplished with a start bit that is as- 
signed a logic state opposite to that of the transmission medium’s idle state, often logic 1 for histori- 
cal reasons. When no data is being sent, the transmission medium, typically a wire, may be driven to 
logic 1. A logic 0 start bit signals the receiver that data is on the way. The receiving UART must be 
configured to handle the same number of data bits sent by the transmitter. Either seven or eight data 
bits are supported by most UARTs. After seven or eight data bits have been captured following the 
start bit, the UART knows that the data unit has completed and it can resume waiting for a new start 
bit. One or more stop bits follow to provide a minimum delay between successive data units so that 
the receiver can complete processing of the current datum before receiving the next one. 

Many UARTs also support some form of error detection in the form of a parity bit. The parity bit 
is the XOR of the data bits and is sent along with data so that it can be recalculated and verified at 
the receiver. Error detection is considered more important on a long-distance data link, as compared 
to on a circuit board, because errors are more prone over longer distances. A parity bit is added to 
each data unit, most often each byte, that tells the receiver if an odd or even number of Is are in the 
data word. The receiver and transmitter must be configured to agree on whether even or odd parity is 
being implemented. Even parity is calculated by XORing all data bits, and odd parity is calculated 
by inverting even parity. The result is that, for even parity, the parity bit will be set if there are an odd 
number of Is in the byte. Conversely, the parity bit will be cleared if there are an odd number of Is 
present. Odd parity is just the opposite, as shown in Fig. 5.2. 

Handshaking is another common feature of UARTs. Handshaking, also called flow control, is the 
general process whereby two ends of a data link confirm that each is ready to exchange data before 
the actual exchange occurs. The process can use hardware or software signaling. Hardware hand- 
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FIGURE 5.2 Odd and even parity. 

shaking involves a receiver driving a ready signal to the transmitter. The transmitter sends data only 
when the receiver signals that it is ready. UARTs may support hardware handshaking. Any software 
handshaking is the responsibility of the UART control program. 

Software handshaking works by transmitting special binary codes that either pause or resume the 
opposite end as it sends data. XON/XOFF handshaking is a common means of implementing soft- 
ware flow control. When one end of the link is ready to accept data, it transmits a standard character 
called XON (0x1 1) to the opposite device. When the receiver has filled a buffer and is unable to ac- 
cept more data, an XOFF character (0x13) is transmitted. It is by good behavior that most flow con- 
trol schemes work: the device that receives an XOFF must respect the signal and pause its 
transmission until an XON is received. It is not uncommon to see an XON/XOFF setting in certain 
serial terminal configurations. 

A generic UART is shown in Fig. 5.3. The UART is divided into three basic sections: CPU inter- 
face, transmitter, and receiver. The CPU interface contains various registers to configure parity, bit rate, 
handshaking, and interrupts. UARTs usually provide three parity options: none, even, and odd. Bit rate 
is selectable well by programming an internal counter to arbitrarily divide an external reference clock. 
The range of usable bit clocks may be from several hundred bits per second to over 100 kbps. 

Interrupts are used to inform the CPU when a new byte has been received and when a new byte is 
ready to be transmitted. This saves the CPU from having to constantly poll the UART’s status regis- 
ters for this information. However, UARTs provide status bits to aid in interrupt status reporting, so 
a simple serial driver program could operate by polling rather than implementing an interrupt service 
routine. Aside from general control and status registers, the CPU interface provides access to trans- 
mit and receive buffers so that data can be queued for transmission and retrieved upon arrival. De- 
pending on the UART, these buffers may be only one byte each, or they may be several bytes 
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FIGURE 5.3 Generic UART block diagram. 
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implemented as a small FIFO. Typically, these serial ports run slow enough to not require deep buff- 
ers, because even a slow CPU can easily respond to a transmit/receive event before the data link 
underruns the transmit buffer or overruns the receive buffer. 

The transmit section implements a parallel-to-serial shift register, parity generator, and framing 
logic. UARTs support framing with a start bit and one or two stop bits where the start bit is a logic 0 
and stop bits are logic Is. It is also common to transmit data LSB first. With various permutations of 
framing options, parity protection, and seven or eight data bits, standard configuration notation is of 
the form <parity:N/E/0>-<width:8/7>-<stop-bits:l/2>. For example, N-8-1 represents no parity, 8 
data bits, and 1 stop bit. E-8-2 represents even parity, 8 data bits, and 2 stop bits. To help understand 
the format of bytes transmitted by a UART, consider Fig. 5.4. Here, two data bytes are transmitted: 
OxAO and 0x67. Keep in mind that the LSB is transmitted first. 

Receiving the serial data is a bit trickier than transmitting it, because there is no clock accompa- 
nying the data with which the data can be sampled. This is where the asynchronous terminology in 
the UART acronym comes from. The receiver contains a clock synchronization circuit that detects 
the start-bit and establishes a timing reference point from which all subsequent bits in the byte will 
be sampled. This reference point is created using a higher-frequency receive clock. Rather than run- 
ning the receiver at lx the bit rate, it may be run at 16x the bit rate. Now the receive logic can de- 
compose a bit into 16 time units and slide a 16-clock window according to where the start bit is 
observed. It is advantageous to sample each subsequent bit halfway through its validity window for 
maximum timing margin on either side of the sampling event. This allows maximum flexibility for 
settling time around the edges of the electrical pulse that defines each bit. 

Consider the waveform in Fig. 5.5. When the start bit is detected, the sampling window is reset, 
and a sampling point halfway through is established. Subsequent bits can have degraded rising and 
falling edges without causing the receiver to sample an incorrect logic level. 



start stop start stop 

bit OxAO bit bit 0x67 bit 

I 1 I 1 

0000001011 01 1 1001 10 1 
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start parity stop stop start parity stop stop 

bit OxAO bit bit bit bit 0x67 bit bit bit 

I 1 I 1 

000000101 01101 1 1001 10 111 



E-8-2 



FIGURE 5.4 Common byte framing formats. 
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FIGURE 5.5 UART receive clock synchronization. 
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5.3 ASCII DATA REPRESENTATION 



Successful communication requires standardized data representation so that people and computers 
around the world can share the same information. Alphanumeric characters are represented by a 
seven-bit standard representation known as the American Standard Code for Information Inter- 
change , or ASCII. ASCII also includes punctuation marks and invisible control codes used to help in 
the display and transfer of data. ASCII was first published in 1968 by the American National Stan- 
dards Institute, or ANSI. The original ASCII standard lacked provisions for many commonly used 
grammatical symbols in languages other than English. Since 1968. there have been many extensions 
to ASCII that have varying support throughout the world according to the prevalent language in each 
country. In the United States, an eight-bit ASCII variant is commonly supported that adds graphical 
symbols and some of the more common foreign language punctuation symbols. The original seven- 
bit ANSI standard ASCII mapping is shown in Table 5.1. The mappings below 0x20 are invisible 
control codes such as tab (0x09). carriage return (OxOD), and line-feed (OxOA). Some of the control 
codes are not in widespread use anymore. 



5.4 RS-232 



Aside from a common data representation format, communication signaling such as framing or error 
detection also requires standardization so that equipment manufactured by different companies can 
exchange information. When one begins discussing communications, an unstoppable journey into 
the sometimes mysterious world of industry standards begins. Navigating these standards can be 
tricky because of subtle differences in terminology between related standards and the everyday jar- 
gon to which the engineering community has grown accustomed. Standards are living documents 
that are periodically updated, revised, or replaced. This shifting base of documentation can add other 
challenges to fully complying with a standard. 

One of the most ubiquitous serial communications schemes in use is defined by the RS-232 fam- 
ily of standards. Most UARTs are designed specifically to support RS-232. Standards purists may 
balk at the common reference to RS-232 in the modern context, for several reasons. First, the origi- 
nal RS-232 document has long since been superseded by multiple revisions. Second, its name was 
changed first to EIA-232, then to EIA/TIA-232. And third, RS-232 is but one of a set of related stan- 
dards that address asynchronous serial communications. These standards have been developed under 
the auspices of the Electronics Industry Alliance (formerly the Electronics Industry Association) and 
Telecommunications Industry Association. Technically, EIA/TIA-232 (first introduced in 1962 as 
RS-232) standardizes the 25-pin D-subminiature (DB25) connector and pin assignment along with 
an obsolete electrical specification that had limited range. EIA/TIA-423 standardizes the modern 
electrical characteristics that enable communication at speeds up to 100 kbps over short distances 
(10 m). EIA/TIA-574 standardizes the popular nine-pin DE9 connector that is used on most new 
"RS-232” equipped devices. These days, when most people talk about an RS-232 port, they are re- 
ferring to the overall RS-232 family of related serial interfaces. In fairness to standards purists, this 
loose terminology is partially responsible for confusion among those who implement and use RS- 
232. From a practical perspective, however, it is most common to use the term RS-232 with addi- 
tional qualifiers (e.g., 9-pin or 25-pin) to convey your point. In fact, if you start mentioning EIA/ 
TIA-574 and 423, you will probably be met by blank stares from most engineers. This somewhat 
shady practice is continued here because of its widespread acceptance in industry. 

RS-232 specifies that the least-significant bit of a byte is transmitted first and is framed by a sin- 
gle start bit and one or two stop bits. Common RS-232 data rates are known to many computer users. 
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TABLE 5.1 Seven-bit ASCII Character Mapping 



Decimal 


Hex 


Value 


Decimal 


Hex 


Value 


Decimal 


Hex 


Value 


Decimal 


Hex 


Value 


0 


0x00 


NUL 


32 


0x20 


SP 


64 


0x40 


@ 


96 


0x60 


' 


1 


0x01 


SOH 


33 


0x21 


! 


65 


0x41 


A 


97 


0x61 


a 


2 


0x02 


STX 


34 


0x22 


" 


66 


0x42 


B 


98 


0x62 


b 


3 


0x03 


ETX 


35 


0x23 


# 


67 


0x43 


C 


99 


0x63 


c 


4 


0x04 


EOT 


36 


0x24 


$ 


68 


0x44 


D 


100 


0x64 


d 


5 


0x05 


ENQ 


37 


0x25 


% 


69 


0x45 


E 


101 


0x65 


e 


6 


0x06 


ACK 


38 


0x26 


& 


70 


0x46 


F 


102 


0x66 


f 


7 


0x07 


BEL 


39 


0x27 


' 


71 


0x47 


G 


103 


0x67 


g 


8 


0x08 


BS 


40 


0x28 


( 


72 


0x48 


H 


104 


0x68 


h 


9 


0x09 


HT 


41 


0x29 


) 


73 


0x49 


I 


105 


0x69 


i 


10 


OxOA 


LF 


42 


0x2A 


* 


74 


0x4A 


J 


106 


0x6A 


j 


11 


OxOB 


VT 


43 


0x2B 


+ 


75 


0x4B 


K 


107 


0x6B 


k 


12 


OxOC 


FF 


44 


0x2C 


■ 


76 


0x4C 


L 


108 


0x6C 


1 


13 


OxOD 


CR 


45 


0x2D 


- 


77 


0x4D 


M 


109 


0x6D 


m 


14 


OxOE 


SO 


46 


0x2E 




78 


0x4E 


N 


110 


0x6E 


n 


15 


OxOF 


SI 


47 


0x2F 


/ 


79 


0x4F 


0 


111 


0x6F 


0 


16 


0x10 


DLE 


48 


0x30 


0 


80 


0x50 


P 


112 


0x70 


P 


17 


Oxll 


DC1/XON 


49 


0x31 


1 


81 


0x51 


Q 


113 


0x71 


q 


18 


0x12 


DC2 


50 


0x32 


2 


82 


0x52 


R 


114 


0x72 


r 


19 


0x13 


DC3/XOFF 


51 


0x33 


3 


83 


0x53 


S 


115 


0x73 


s 


20 


0x14 


DC4 


52 


0x34 


4 


84 


0x54 


T 


116 


0x74 


t 


21 


0x15 


NAK 


53 


0x35 


5 


85 


0x55 


U 


117 


0x75 


u 


22 


0x16 


SYN 


54 


0x36 


6 


86 


0x56 


V 


118 


0x76 


V 


23 


0x17 


ETB 


55 


0x37 


7 


87 


0x57 


w 


119 


0x77 


w 


24 


0x18 


CAN 


56 


0x38 


8 


88 


0x58 


X 


120 


0x78 


X 


25 


0x19 


EM 


57 


0x39 


9 


89 


0x59 


Y 


121 


0x79 


y 


26 


OxlA 


SUB 


58 


0x3A 




90 


0x5A 


Z 


122 


0x7A 


z 


27 


OxlB 


ESC 


59 


0x3B 


; 


91 


0x5B 


[ 


123 


0x7B 


{ 


28 


OxlC 


FS 


60 


0x3C 


< 


92 


0x5C 


\ 


124 


0x7C 


1 


29 


OxlD 


GS 


61 


0x3D 


= 


93 


0x5D 


] 


125 


0x7D 


} 


30 


OxlE 


RS 


62 


0x3E 


> 


94 


0x5E 


A 


126 


0x7E 


~ 


31 


OxlF 


US 


63 


0x3F 


? 


95 


0x5F 


- 


127 


0x7F 


DEL 
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Standard bit rates are 2 N multiples of 300 bps. In the 1970s, 300 bps serial links were common. Dur- 
ing the 1980s, links went from 1,200 to 2,400, to 9,600 bps. RS-232 data links now operate at speeds 
from 19.2 to 153.6 kbps. Standard RS-232 bit rates are typically divided down from reference clocks 
such as 1.843, 3.6864, 6.144, and 11.0592 MHz. This explains why many microprocessors operate 
at oddball frequencies instead of even speeds such as 5, 10, or 12 MHz. 

RS-232 defines signals from two different perspectives: data communications equipment (DCE) 
and data terminal equipment (DTE). DCE/DTE terminology evolved in the early days of computing 
when the common configuration was to have a dumb terminal attached to a modem of some sort to 
enable communication with a mainframe computer in the next room or building. A person would sit 
at the DTE and communicate via the DCE. Therefore, in the early 1960s, it made perfect sense to 
create a communication standard that specifically addressed this common configuration. By defining 
a set of DTE and DCE signals, not only could terminal and modem engineers design compatible sys- 
tems, but cabling would be very simple: just wire each DTE signal straight through to each DCE sig- 
nal. To further reduce confusion, the DTE was specified as a male DB-25 and the DCE as a female 
DB-25. Aside from transmit and receive data, hardware handshaking signals distinguish DCE from 
DTE. Some signals are specific to modems such as carrier detect and ring indicator and are still 
used today in many modem applications. 

The principle behind RS-232 hardware handshaking is fairly simple: the DTE and DCE indicate 
their operational status and ability to accept data. The four main handshaking signals are request to 
send (RTS), clear to send (CTS), data terminal ready (DTR), and data set ready (DSR). DTR/DSR 
enable the DTE and DCE to signal that they are both operational. The DTE asserts DTR, which is 
sensed by the DCE and vice versa with DSR. RTS/CTS enable actual data transfer. RTS is asserted 
by the DTE to signal that the DCE can send it data. CTS is asserted by the DCE to signal the DTE 
that it can send data. In the case of a modem, carrier detect is asserted to signal an active connection, 
and ring indicator is asserted when the telephone line rings, signaling that the DTE can instruct the 
modem to answer the phone. 

In a null-modem configuration, two DTEs are connected, and each considers DTR and RTS out- 
puts and DSR and CTS inputs. This is solved by swapping DTR/DSR and RTS/CTS so that one 
DTE’s DTR drives the other’s DSR, and so on. The unidirectional carrier detect is also connected to 
the DTR signal at the other end (DSR at the local end) to provide positive “carrier detect” when the 
terminal ready signal is asserted. 

Table 5.2 lists the full set of RS-232 signals with the convention that signals are named relative to 
the DTE. Most of the original 25 defined RS-232 signals are rarely used, as evidenced by the popu- 
larity of the smaller DE9 connector. Furthermore, a minimal RS-232 serial link can be implemented 
with only three wires: transmit, receive, and ground. In more recent times, the DTE/DCE distinction 
has created confusion in more than one engineering department, because the definitions of terminal 
and modem do not always hold in the more varied modern digital systems context. Often, all RS-232 
ports are configured as DTE, and special crossover, or null-modem, cables are used to properly con- 
nect two DTEs. While varying subsets of the DTE pin assignment can be found in many systems, 
there is still a place for the original DTE/DCE configuration. It is rare, however, to find the DB25 
pins that are not implemented in the DE9 actually put to use. 

Not all RS-232 interfaces are configured for hardware handshaking. Some may ignore these sig- 
nals entirely, and others require that these signals be tied off to the appropriate logic levels so that 
neither end of the link gets confused and believes that the other is preventing it from sending data. 
Using a software flow control mechanism can eliminate the need for the aforementioned hardware 
handshaking signals and reduce the RS-232 link to its three basic wires: transmit, receive, and 
ground. These many permutations of DTE/DCE and various degrees of handshaking are what cause 
substantial grief to many engineers and technicians as they build and set up RS-232 equipment. 
There is a healthy industry built around the common RS-232 configuration problems. Breakout 
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TABLE 5.2 RS-232 DTE Pin Assignments 



DB25 DTE 


DE9 DTE 


Signal Direction: DTE/DCE 


Description 


1 


- 


Shield 


<=> 


Shield/chassis ground 


2 


3 


TXD 


=» 


Transmit data 


3 


2 


RXD 




Receive data 


4 


7 


RTS 


=» 


Request to send 


5 


8 


CTS 




Clear to send 


6 


6 


DSR 




Data set ready 


7 


5 


Ground 


<=> 


Signal ground 


8 


1 


DCD 


<= 


Data carrier detect 


9 


- 


+V 


<=> 


Power 


10 


- 


-V 


<=> 


Power return 


11 


- 






Unused 


12 


- 


SCF 




Secondary line detect 


13 


- 


SCB 


<= 


Secondary CTS 


14 


- 


SBA 


=» 


Secondary TXD 


15 


- 


DB 




DCE element timing 


16 


- 


SBB 




Secondary RXD 


17 


- 


DD 


<= 


Receiver element timing 


18 


- 






Unused/local-loopback 


19 


- 


SCA 


=» 


Secondary RTS 


20 


4 


DTR 


=» 


Data terminal ready 


21 


- 


CQ 


<= 


Signal quality detect 


22 


9 


RI 


<= 


Ring indicator 


23 


- 


CH/CI 




Data rate detect 


24 


- 


DA 


=» 


Transmitter element timing 


25 


- 






Unused/test-mode 



boxes can be purchased that consist of jumper wires, switches, and LEDs to help troubleshoot RS- 
232 connectivity problems by reconfiguring interfaces on the fly as the LEDs indicate which signals 
are active at any given moment. As a result of the male/female gender differences of various DB25/ 
DE9 connectors, there are often cabling problems for which one needs to connect two males or two 
females together. Once again, the industry has responded by providing a broad array of gender- 
matching cables and adapters. On a conceptual level, these problems are simple; in practice, the per- 
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mutations of incompatibilities are so numerous that debugging a 1960s-era RS-232 connection may 
not be a quick task. 

Male DB25 and DE9 connectors consist of a dual row of staggered pins surrounded by a metal 
rim that serves as an electrical shield. The female connectors consist of matching staggered pin- 
sockets mounted in a solid frame whose edge forms a shield that mates with the male shield. These 
connectors are illustrated in Fig. 5.6. 

The D-subminiature connector family uses a three-element nomenclature to specify the size of 
the connector housing, or shell, and the number of pins within the shell. There are five standard shell 
designations — A, B, C, D. E — that were originally specified with varying numbers of pins as shown 
in Table 5.3. DE9 connectors are commonly misrepresented as DB9, a connector configuration that 
is not defined. A modern D-subminiature connector that was not originally specified is the common 
HDE15, a high-density 15-pin connector using the E-size shell. The HDE15 is commonly used to 
connect monitors to desktop computers. 



TABLE 5.3 Standard D-Submlniature 
Shell Sizes 



Shell Size 


Pins 


A 


15 


B 


25 


C 


37 


D 


50 


E 


9 



Logical transceiver-level characteristics such as bit rate, error detection, and framing are accom- 
panied by electrical transducer-level characteristics, more commonly referred to as the physical 
layer of a communications link. RS-232 refers to the logic 1 state as a mark and assigns it a negative 
potential from -3 to -25 V. The logic-0 state is a space and is assigned a positive potential from +3 
to +25 V. Since RS-232 inverts the logic levels, an idle link is held at negative voltage, logic 1 . 

While RS-232 is specified with a transmitter voltage range of ±3 to ±25 V, most modern transmit- 
ters operate well below the 25-V upper bound. Many systems have been based around the ubiquitous 
and inexpensive 1488/1489 transmitter/receiver chipset that operate at ±12 V. These chips require an 
external ±12-V source for power. RS-232 circuitry was fundamentally simplified when Maxim 
Semiconductor created their MAX232 line of single-supply 5-V line interface ICs. These chips con- 
tain internal circuitry that generates ±8 V. Today, a variety of flexible RS-232 interface ICs are avail- 



Male Female 

Pin 1 Pin 13 Pin 13 Pin 1 
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Pin 14 Pin 25 Pin 25 Pin 14 
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C.v.r) 

Pin 6 Pin 9 Pin 9 Pin 6 



FIGURE 5.6 DB25 and DE9 connectors. 
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able from other manufacturers including Linear Technology, National Semiconductor, and Texas 
Instruments. RS-232 ports work quite well on even lower voltage ranges, because modern receivers 
are sensitive to smaller absolute voltages, and most RS-232 links are several meters or less in length. 
RS-232 was never intended to serve in truly long-distance applications. 



5.5 RS-422 



For crossing distances greater than several meters, RS-232 is supplemented by the RS-422 standard. 
RS-422 can provide communications across more than 1.2 km at moderate bit rates such as 9.6 kbps. 
It is a differential, or balanced, transmission scheme whereby each logical signal is represented by 
two wires rather than one. RS-232 signals are single-ended, or unbalanced, signals that drive a par- 
ticular voltage onto a single wire. This voltage is sensed at the receiver by measuring the signal volt- 
age relative to the ground potential of the interface. Over long distances or at very high speeds, 
single-ended transmission lines are more subject to degradation resulting from ambient electrical 
noise. A partial explanation of this characteristic is that the electrical noise affects the active signal 
wire unequally with respect to ground. Differential signals, as in RS-422, drive opposing, or mir- 
rored, voltages onto two wires simultaneously (RS-422 is specified from ±2 to ±6 V). The receiver 
then compares the voltages of the two wires together rather than to ground. Ambient noise tends to 
affect the two wires equally, because they are normally twisted together to follow the same path. 
Therefore, if noise causes a 1-V spike on one-half of the differential pair, it causes the same spike on 
the other half. When the two voltages are electrically subtracted at the receiver, the 1-V of common- 
mode noise cancels out, and the original differential voltage remains intact (subject, of course to nat- 
ural attenuation over distance). The difference between RS-232 and RS-422 transmission is illus- 
trated in Fig. 5.7. 

Because of the longer distances involved in RS-422 interfaces, it is not common to employ the 
standard set of hardware handshaking signals that are common with RS-232. Therefore, some form 
of software handshaking must be implemented by the end devices to properly communicate. Some 
applications may not require any flow control, and some may use the XON/XOFF method. RS-422 
does not specify a standard connector. It is not uncommon to see an RS-422 transmission line’s bare 
wire ends connected to screw terminals. 

Another common difference between RS-422 and RS-232 is transmission line termination. Trans- 
mission line theory can get rather complicated and is outside the scope of this immediate discussion. 




FIGURE 5.7 RS-232 vs. RS-422 signaling. 
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The basic practical result of transmission line theory is that, as the speed-distance product of an elec- 
trical signal increases, the signal tends to reflect off the ends of wires and bounce back and forth on 
the wire. When slow signals travel relatively moderate distances, the speed-distance product is not 
large enough to cause this phenomenon to any noticeable degree. Fast signals traveling over very 
short distances may also be largely immune to such reflections. However, when RS-422 signals 
travel over several kilometers, the speed-bandwidth product is great enough to cause previously 
transmitted data signals to reflect and interfere with subsequent data. This problem can be largely 
solved by properly terminating the receiving end of the transmission line with the line’s characteris- 
tic impedance , Z Q . Typical coaxial and twisted-pair transmission lines have Z 0 = 50, 75, or 110 Q. 
Briefly put, Z Q is the impedance, or electrical resistance, that would be observed between both con- 
ductors of a balanced transmission line of infinite length. Again, there is substantial theory lurking 
here, but the practical result is that, by placing a resistor equal to Z Q at the far end of the line be- 
tween both conductors, the transmission line will appear to be continuous and not exhibit reflections. 
A typical schematic diagram of a terminated RS-422 serial link is shown in Fig. 5.8. 



5.6 MODEMS AND BAUD RATE 



Information is conveyed by varying the electromagnetic field of a particular medium over time. The 
rate at which this field (e.g., voltage) changes can be represented by a certain bandwidth that charac- 
terizes the information. Transducers such as those that facilitate RS-232/RS-422 serial links place 
the information that is presented to them essentially unmodified onto the transmission medium. In 
other words, the bandwidth of the information entering the transducer is equivalent to that leaving 
the transducer. Such a system operates at baseband: the bandwidth inherent to the raw information. 
Baseband operation is relatively simple and works well for a transmission medium that can carry 
raw binary signals with minimal degradation (e.g., various types of wire, or fiber optic cable, strung 
directly from transmitter to receiver). However, there are many desirable communications media that 
are not well suited to directly carrying bits from one point to another. Two prime examples are free- 
space and acoustic media such as a telephone. 

To launch raw information into the air or over a telephone, the bits must be superimposed upon a 
carrier that is suited to the particular medium. A carrier is a frequency that can be efficiently radi- 
ated from a transmitter and detected by a remote receiver. The process of superimposing the bits on 
the carrier is called modulation. The reverse process of detecting the bits already modulated onto the 
carrier is demodulation. For the purposes of this discussion, one of the simplest forms of modula- 
tion, binary amplitude modulation (AM), is presented as an example. More precisely, this type of 
AM is called amplitude shift keying (ASK). With two states, it is called 2-ASK and is illustrated in 
Fig. 5.9. Each time a 1 is to be transmitted, the carrier (shown as a sine wave of arbitrary frequency) 
is turned on with an arbitrary amplitude. Each time a 0 is to be transmitted, the carrier is turned off 
with an amplitude of zero. If transmitting over free space, the carrier frequency might be anywhere 
from hundreds of kilohertz to gigahertz. If communicating over a fiber optic cable, the carrier is 




FIGURE 5.8 RS-422 transmission line termination. 
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FIGURE 5.9 2-ASK modulation. 

light. If an acoustic medium such as a telephone is used to send the data, the carrier is audible in the 
range of several kilohertz. 

Frequency shift keying (FSK), a type of frequency modulation (FM) is a scheme that can be used 
to transmit multiple bits simultaneously without resorting to multiple levels of amplitude by using 
AM. FSK represents multiple bits by varying the frequency rather than the amplitude of the carrier. 
This constant amplitude approach is less susceptible to noise. Figure 5.10 shows 4-FSK modulation, 
in which each of the four frequency steps represents a different two-digit binary value. 

A general term for a modulated data unit is a baud. If 2-ASK is used, each baud corresponds to 
one bit. Therefore, the baud rate matches the bit rate. However, the 4-FSK example shows that each 
baud represents two bits, making the bit rate twice that of the baud rate. This illustrates that baud 
rate and bit rate are related but not synonymous, despite common misuse in everyday conversation. 
Engineers who design modulation circuitry care about the baud rate, because it specifies how many 
unique data units can be transmitted each second. They also try to squeeze as many bits per baud as 
possible to maximize the overall bit rate of the modulator. Engineers who use modulators as black- 
box components do not necessarily care about the baud rate; rather, it is the system’s bit rate that 
matters to the end application. 

Enter the modem. A modem is simply a device that incorporates a modulator and demodulator for 
a particular transmission medium. The most common everyday meaning of modem is one that en- 
ables a computer to transfer bits over an analog telephone line. These modems operate using differ- 
ent modulation schemes depending on their bit rate. Early 300- and 1,200-bps modems operate 
using FSK and phase shift keying (PSK). Later modems, including today’s 33.6- and 56-kbps mod- 
els, operate using variations of quadrature amplitude modulation (QAM). 

While modem often refers to telephone media, it is perfectly correct to use this term when refer- 
ring to a generic modulator/demodulator circuit that operates on another medium. Digital wireless 
communication is increasingly common in such applications as portable cellular phones and unteth- 
ered computer networking. These devices incorporate radio frequency (RF) modems in addition to 
digital transceivers that frame the data as it travels from one point to another. 



5.7 NETWORK TOPOLOGIES 



The communications schemes discussed thus far are point-to-point connections — they involve one 
transmitter and one receiver at either end of a given medium. Many applications require multidrop 
communications whereby multiple devices exchange data over the same medium. The general term 
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FIGURE 5.10 4-FSK modulation. 
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for a multidrop data link is a network. Networks can be constructed in a variety of topologies: buses, 
rings, stars, and meshes, as shown in Fig. 5.11. 

A bus structure is the most basic network topology in which all nodes share the same physical me- 
dium. When one node wishes to transmit data, it must wait for another node to finish and release the 
bus before it can begin. The ring topology implements a daisy-chained set of connections where each 
node connects to its two nearest neighbors, and information usually flows in one direction (although 
bidirectional rings are a variation on this theme). A benefit of the ring is that a single long wire does 
not have to travel between all nodes. One disadvantage is that each node is burdened with the require- 
ment of passing on information that is not destined for it to keep the message from being lost. 

Mesh networks provide ultimate connectivity by connecting each node to several of its neighbors. 
A mesh can provide increased bandwidth as well as fault tolerance as a result of its multiple connec- 
tions. Properly designed, a mesh can route traffic around a failed link, because multiple paths exist 
between each node in the network. The downside to these benefits is increased wiring and communi- 
cations protocol complexity. 

Star networks connect each node to a common central hub. The benefits of a physical star topol- 
ogy include ease of management, because adding or removing nodes does not affect the wiring of 
other nodes. A downside is that more wiring is necessary to provide a unique physical connection 
between each node and the central hub. A starred network may send data only to the node for which 
it is destined. Unlike a ring, the node does not have to pass through information that is not meant for 
it. And unlike a bus, the node does not have to ignore messages that are not meant for it. The require- 
ment for a central hub increases the complexity of a star network. As more nodes are added to the 
network, the hub must add ports at the same rate. 

A network may be wired using a physical star topology, but it may actually be a bus or ring from 
a logical, or electrical, perspective. Implementing differing physical and logical topologies is illus- 
trated in Fig. 5.12. Some types of networks inherently favor bus or ring topologies, but the flexible 
management of star wiring is an attractive alternative to a strictly wired bus or ring. Star wiring en- 
ables nodes to be quickly added or disconnected from the central hub without disrupting other 
nodes. Bus and ring topologies may require the complete or partial disruption of the network me- 
dium to add or remove nodes. A star’s hub typically contains electronics to include or bypass indi- 
vidual segments as they are added or removed from the network without disrupting other nodes. 



5.8 NETWORK DATA FORMATS 



Common data formats and protocols are necessary to regulate the flow of data across a network to 
ensure proper addressing, delivery, and access to that common resource. Several general terms for 
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FIGURE 5.1 1 Basic network topologies. 
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FIGURE 5.12 Physical vs. logical network topologies. 

message elements on a network are frame, packet, and cell. Frames are sets of data that are framed at 
the beginning and end by special delimiters. Packets are sets of data that are not fully framed but that 
have some other means of determining their size, such as an embedded length field. Cells are fixed- 
length frames or packets. Frames and packets usually imply variable length data sets, but this is not a 
strict rule. As with many terms and classifications in digital systems, specific definitions are context 
specific and are often blurred: one system’s cells may be another’s frames. Frames, packets, and 
cells are composed of headers, payloads, and trailers, as shown in Fig. 5.13. The header is a collec- 
tion of data fields that handle network overhead functions such as addressing and delineation. The 
actual data to be transmitted is placed into the payload. If present, a trailer is commonly used to im- 
plement some form of error checking and/or delineation. Not all packet formats specify the inclusion 
of trailers. When present, a trailer is usually substantially smaller in length than the header. 

Networking is an aspect of digital systems design that directly involves hardware-software inter- 
action at a basic level. One cannot really design networking hardware without keeping in mind the 
protocol, or software, support requirements. One key example is packet format. Hardware must have 
knowledge of the packet format so that it can properly detect a packet that is sent to it. At the same 
time, software must have this same knowledge so that it can properly parse received packets and 
generate new ones to be transmitted. 

As soon as more than two nodes are connected to form a network, issues such as addressing and 
shared access arise. When there is only one transmitter and one receiver, it is obvious that data is in- 
tended for the only possible recipient. Likewise, the lone transmitter can begin sending data at any 
time it chooses, because there are no other transmitters competing for network access. 

Network addressing is the mechanism by which a transmitting node indicates the destination for 
its packet. Each node on the network must therefore have a unique address to prevent confusion over 
where the packet should be delivered. In a bus topology, each node watches all the data traffic that is 
placed onto the network and picks out those packets that are tagged with its unique address. In a ring 
topology, each node passes packets on to the next node if the destination address is not matched with 
that node’s address. If the address is matched, the node absorbs the packet and does not forward it on 
to the next node in the ring. Logical star and mesh topologies function a bit differently. Nodes on 
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FIGURE 5.13 Generic packet structure. 
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these types of networks do not observe all traffic that traverses the network; rather, the network itself 
contains some intelligence when it comes to delivering a packet. A node in a logically starred net- 
work sends a packet to a central hub that examines the destination address and then forwards the 
message to only the specified node. A mesh network routes traffic partly like a ring and partly like a 
star; however, multiple paths between nodes exist to complicate the delivery process. Based on the 
destination address, the originating node sends a packet to one of its neighbors, which in turn for- 
wards the message to one of its neighbors. This process continues until the path has been completed 
and the packet arrives at its intended destination. The presence of multiple valid paths between 
nodes requires the mesh network to use knowledge about the location of nodes to select an optimal 
path through the network. 

Access sharing is necessary on networks to ensure that each node eventually has an opportunity to 
send a message. Numerous methods of access sharing have been implemented over the years. Gen- 
erally speaking, the length of messages is bounded to prevent a node from transmitting an infinitely 
long set of data and preventing anyone else from gaining access to the shared medium. Sharing algo- 
rithms differ according to the specific network topology involved. Networks that are a collection of 
point-to-point links (e.g., ring, star, mesh) do not have to worry about multiple nodes fighting for ac- 
cess to the same physical wire, but do have to ensure that one node does not steal all the bandwidth 
from others. Bus networks require sharing algorithms that address both simultaneous physical con- 
tention for the same shared wire in addition to logical contention for the network’s bandwidth. Arbi- 
tration schemes can be centralized (whereby a single network master provides permission to each 
node to transmit) or distributed (whereby each node cooperates on a peer-to-peer level to resolve si- 
multaneous access attempts). 

After deciding on a network topology, one of the first issues to resolve is the network packet for- 
mat. If the network type is already established (e.g., Ethernet), the associated formats and protocols 
are already defined by industry and government standards committees. If an application benefits from 
a simple, custom network, the packet format can be tailored to suit the application’s specific needs. 

Delineation and addressing are the two most basic issues to resolve. Delineation can be accom- 
plished by sending fixed-size packets, embedding a length field in the packet header, or by reserving 
unique data values to act as start/stop markers. Framing with unique start/stop codes places a restric- 
tion on the type of data that a packet can contain: it cannot use these unique codes without causing 
false start or end indications. Referring back to Table 5.1, notice that start-of-header (SOH) and end- 
of-transmission (EOT) are represented by 0x01 and 0x04. These (or other pairs of codes) can be 
used as delimiters if the packet is guaranteed to contain only alphanumeric ASCII values that do not 
conflict with these codes. 

Addressing is normally achieved by inserting both the destination and source addresses into the 
header. However, some networking schemes may send only a single address. Sending both addresses 
enables recognition of the destination as well as a determination of which node sent the packet. Since 
most data exchanges are bidirectional to a certain degree, a destination node will probably need to send 
some form of reply to the source node of a particular packet. Many networks include a provision known 
as broadcast addressing whereby a packet is sent to all nodes on the network rather than just one. This 
broadcast is often indicated using a reserved broadcast address. In contrast to a unicast address that is 
matched by only one node, a broadcast address is matched by all nodes on the network. Some networks 
also have multicast addresses that associate multiple nodes with a single destination address. 



5.9 RS-485 



Whereas RS-232 and RS-422 enable point-to-point serial links, the RS-485 standard enables multi- 
ple-node networks. Like RS-422, RS-485 provides differential signaling to enable communications 
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across spans of twisted-pair wire exceeding 1.2 km. Unlike RS-422, the RS-485 standard allows up 
to 32 transmit/receive nodes on a single twisted pair that is terminated at each end as shown in Fig. 
5.14. Modern low-load receivers that draw very little current from the RS-485 bus can be used to in- 
crease the number of nodes on an RS-485 network well beyond the original 32-node limit to 256 
nodes or more. A single pair of wires is used for both transmit and receive, meaning that the system 
is capable of half-duplex (one-way) operation rather than full-duplex operation (both directions at 
the same time). Half-duplex operation restricts the network to one-way exchange of information at 
any given time. When node A is sending a packet to node B, node B cannot simultaneously send a 
packet to node A. 

RS-485 directly supports the implementation of bus networks. Bus topologies are easy to work 
with, because nodes can directly communicate with each other without having to pass through other 
nodes or semi-intelligent hubs. However, a bus network requires provisions for sharing access to be 
built into the network protocol. In a centralized arbitration scheme, a master node gives permission 
for any other node to transmit data. This permission can be a request-reply scheme whereby slave 
nodes do not respond unless a request for data is issued. Alternatively, slave nodes can be periodi- 
cally queried by the master for transmit requests, and the master can grant permissions on an indi- 
vidual-node basis. There are many centralized arbitration schemes that have been worked out over 
the years. 

A common distributed arbitration scheme on a bus network is collision detection with random 
back-off. When a node wants to transmit data, it first waits until the bus becomes idle. Once idle, the 
node begins transmitting data. However, when the node begins transmitting, there is a chance that 
one or more nodes have been waiting for an opportunity to begin transmitting and that they will be- 
gin transmitting at the same time. Collision detection circuits at each node determine that more than 
one node is transmitting, and this causes all active transmitters to stop. Figure 5.15 shows the imple- 




FIGURE 5.14 RS-485 bus topology. 




FIGURE 5.15 RS-485 collision detection transceiver. 
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mentation of an RS-485 transceiver with external collision detection logic. A transmit enable signal 
exists to turn off the transmitter when the UART is not actively sending data. Unlike an RS-422 
transmitter that does not have to share access with others, the RS-485 transmitter must turn itself off 
when not sending data to enable others to transmit. 

When transmitting, the receiver returns the logical state of the twisted-pair bus. If the bus is not at 
the same state as the transmitted data, a collision is most likely being caused by another transmitter 
trying to drive the opposite logic state. An XOR gate implements this collection detect, and the XOR 
output must be sampled only after allowing adequate time for the bus to settle to a stable state fol- 
lowing the assertion of each bit from the transmitter. 

Once a collision has been detected by each node and the transmitters are disabled, each node 
waits a different length of time before retransmitting. If all delays were equal, multiple nodes would 
get caught in a deadlock situation wherein each node keeps trying to transmit after the same delay 
interval. Random back-off delays are pseudo-random so as to not unfairly burden some nodes with 
consistently longer delays than other nodes. At the end of the delay, one of the nodes begins trans- 
mitting first and gains control of the bus by default. The other waiting nodes eventually exit from 
their delays and observe that the bus is already busy, indicating that they must wait their turn until 
the current packet has been completed. If, by coincidence, another node begins transmitting at the 
same time that the first node begins, the back-off process begins again. It is statistically possible for 
this process to occur several times in a row, although the probability of this being a frequent event is 
small in a properly designed network. A bus network constructed with too many nodes trying to send 
too much data at the same time can exhibit very poor performance, because it would be quite prone 
to collisions. In such a case, the solution may be to either reduce the network traffic or increase the 
network’s bandwidth. 



5. 10 A SIMPLE RS-485 NETWORK 



An example of a simple but effective network implemented with RS-485 serves as a vehicle to dis- 
cuss how packet formats, protocols, and hardware converge to yield a useful communications me- 
dium. The motivation to create a custom RS-485 network often arises from a need to deploy remote 
actuators and data-acquisition modules in a factory or campus setting. A central computer may be lo- 
cated in a factory office, and it may need to periodically gather process information (e.g„ tempera- 
ture, pressure, fluid-flow rate) from a group of machines. Alternatively, a security control console 
located in one building may need to send security camera positioning commands to locations 
throughout the campus. Such applications may involve a collection of fairly simple and inexpensive 
microprocessor-based modules that contain RS-485 transceivers. Depending on the exact physical 
layout, it may or may not be practical to wire all remote nodes together in a single twisted-pair bus. 
If not, a logical bus can be formed by creating a hybrid star/bus topology as shown in Fig. 5.16. A 
central hub electrically connects the individual star segments so that they function electrically as a 
large bus but do not require a single wire to be run throughout the entire campus. 

As shown, the hub does not contain any intelligent components — it is a glorified junction box. 
This setup is adequate if the total length of all star segments does not exceed 1.2 km, which is within 
the electrical limitations of the RS-485 standard. While simple, this setup suffers from a lack of fault 
tolerance. If one segment of the star wiring is damaged, the entire network may cease operation be- 
cause, electrically, it is a single long pair of wires. Both the distance and fault-tolerance limitations 
can be overcome by implementing an active hub that contains repeaters on each star segment and 
smart switching logic to detect and isolate a broken segment. A repeater is an active two-port device 
that amplifies or regenerates the data received on one port and transmits it on the other port. An RS- 
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FIGURE 5.16 Hybrid star/bus network topology. 

485 repeater needs a degree of intelligence, because both ports must be bidirectional. Therefore, the 
repeater must be able to listen for traffic on both sides, detect traffic on one side, and then transmit 
that traffic on the other side. A hub that detects and isolates segment failures would be well designed 
to report this fault information to a central control node to alert the human operator that repairs are 
necessary. These possible improvements in the network hub do not affect the logical operation of the 
network and, consequently, are not a focus of this discussion. 

With a topology chosen and a general application in mind, the next step is to decide on the net- 
work's operational requirements from among the following: 

1. Support for roughly 200 nodes provides flexibility for a variety of control applications. 

2. Central arbitration handled by master control node for simplicity of network design. A facility 
control network is often a master-slave application, because all data transfers are at the request 
of the central controller. Central arbitration removes the need for collision-detect hardware and 
random back-off algorithms. 

3. Broadcast capability enables easy distribution of network status information from the master 
control node. 

4. Data rate of 9600 bps provides adequate bandwidth for small control messages without burden- 
ing the network with high frequencies that can lead to excessive noise and signal degradation. 

5. Basic error handling prevents processing incorrect data and network lock-up conditions when 
occasional noise on the RS-485 twisted-pairs causes data bits to change state. 

Many aspects of network functionality are directly influenced by a suitable network packet for- 
mat. Other aspects are addressed by the protocol that formats data on the network, by the transceiver 
and UART hardware, or by a combination of these three elements. 

In considering the packet format, 8-bit destination and source addresses are chosen to support 
more than 200 nodes on the network. A special destination address value of OxFF represents a broad- 
cast address, meaning that all nodes should accept the packet automatically. Such broadcast packets 
are useful for system-wide initialization whereby, for example, the control computer can send the 
current time to all nodes. This multicast address cannot be used as a normal node address, thereby 
limiting the network to 255 unique nodes. 

It is desirable to employ variable-length packets so that a message does not have to be longer than 
necessary, thereby conserving network bandwidth. Variable-length packets require some mechanism 
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to determine the length: either reserved framing codes or an explicit length count. A length count is 
chosen to keep the system simple. Framing codes would require that certain data values be off limits 
to the contents of the message. The payload length is bounded at a convenient binary threshold: 255 
bytes. For simple control and data-acquisition applications, this is probably more than enough. 

Based on these basic requirements and a couple of quick decisions, a packet format quickly 
emerges. A three-byte, fixed-length header shown in Table 5.4 is followed by a variable-length pay- 
load. No trailer is necessary in this network. 



TABLE 5.4 Hypothetical Packed Header Format 



Field Name 


Byte 


Bits 


Description 


DA 


0 


[7:0] 


Destination address (OxFF = multicast) 


SA 


1 


[7:0] 


Source address 


I.F.N 


2 


[7:0] 


Payload length (0x0 = no payload present) 



The eight-bit destination address field, DA. comes first to enable the receiving hardware to 
quickly determine whether the packet should be accepted by the node or ignored. A packet will be 
accepted if DA matches the receiver’s node address, or if DA equals OxFF, indicating a broadcast 
packet. At the end of the header is an eight-bit length field that indicates how many payload bytes are 
present after the fixed-length header. This limits the maximum packet size to 255 payload bytes plus 
the 3-byte header. A value of zero means that there is no payload, only a header in the packet. 

Error detection can be handled by even parity. Each byte of the header and payload is sent with an 
accompanying parity bit. When an error is detected, the network’s behavior must be clearly defined 
to prevent the system from either ceasing to function or acting on false data. Parity errors can mani- 
fest themselves in a variety of tricky ways. For example, if the length field has a parity error, how 
will the receiver know the true end of the frame? Without proper planning, a parity error on the 
length field can permanently knock the receivers out of sync and make automatic recovery impossi- 
ble. This extreme situation can occur when an invalid length causes the receiver to either skip over 
the next frame header or prematurely interpret the end of the current frame as a new header. In both 
cases, the receiver will falsely interpret a bogus length field, and the cycle of false header detection 
can continue indefinitely. 

If a parity error is detected on either the destination or source addresses, the receivers will not lose 
synchronization, but the packet should be ignored, because it cannot be known who the true recipi- 
ent or sender of the packet is. 

Fault tolerance in the case of an invalid payload length can be handled in a relatively simple man- 
ner. Requirements of no intrapacket gaps and a minimum interpacket gap assist in recovery from 
length-field parity errors. The absence of intrapacket gaps means that, once a packet has begun trans- 
mission, its bytes must be continuous without gaps. Related to this is the requirement of a minimum 
interpacket gap which forces a minimum idle period between the last byte of one packet and the start 
of the next packet. These requirements help each receiver determine when packets are starting and 
ending. Even if a packet has been subjected to parity errors, the receiver can wait until the current 
burst of traffic has ended, wait for the minimum interpacket gap, and then begin looking for the next 
packet to begin. 

The parity error detection and accompanying recovery scheme greatly increases the probability 
that false data will not be acted upon as correct data and that the entire network will not stop func- 
tioning when it encounters an arbitrary parity error. However, error detection is all about probability. 
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A single parity bit cannot guarantee the detection of multiple errors in the same byte, because such 
errors can mask themselves. For example, two bit errors can flip a data bit and the parity bit itself, 
making it impossible for the receiver to detect the error. More complex error detection schemes are 
available and are more difficult to fool. Although no error detection solution is perfect, some 
schemes reduce the probability of undetected errors to nearly zero. 

If a packet is received with an error, it cannot be acted upon normally, because its contents are 
suspect. For the purposes of devising a useful error-handling scheme, packet errors can be divided 
into two categories: those that corrupt the destination/source address information and those that do 
not. Parity errors that corrupt the packet's addresses must result in the packet being completely ig- 
nored, because the receiving node is unable to generate a reply message to the originator indicating 
that the packet was corrupted. If the source address is corrupted, the receiver does not know to whom 
to reply. If the destination address is corrupted, the receiver does not know whether it is the indented 
recipient. 

In the case of an address error in which the received packet is ignored, the originator must imple- 
ment some mechanism to recover from the packet loss rather than waiting indefinitely for a reply 
that will never arrive. A reply timeout can be implemented by an originator each time a packet is sent 
that requires a corresponding reply. A timeout is an arbitrary delay during which an originating node 
waits before giving up on a response from a remote node. Timeouts are common in networks be- 
cause, if a packet is lost due to an error, the originator should not wait indefinitely for a response that 
will never come. Establishing a timeout value is a compromise between not giving up too quickly 
and missing a slower-than-normal reply and waiting too long and introducing unacceptable delays in 
system functionality when a packet is lost. Depending on the time it takes to send a packet on a net- 
work and the nodes’ typical response time, timeouts can range from microseconds to minutes. Typi- 
cal timeouts are often expressed in milliseconds. 

When an originator times-out and concludes that its requested data somehow got lost, it can re- 
send the request. If, for example, a security control node sends a request for a camera to pan across a 
room, and that request is not acknowledged within half a second, the request can be retransmitted. 

In the case of a non-address error, the receiving node has enough information to send a reply back 
to the originator, informing it that the packet was not correctly received. Such behavior is desirable 
to enable the originator to retransmit the packet rather than waiting for a timeout before resending 
that data. 

The preceding details of a hypothetical RS-485 network must be gathered into network driver 
software to enable proper communication across the network. While hardware controls the detection 
of parity errors and the flow of bits, it is usually software that generates reply messages and counts 
down timeouts. Figure 5.17 distills this information into a single flowchart from which software rou- 
tines could be written. 

As seen from this flowchart, transmit and receive processes run concurrently and are related. The 
transmit process does not complete until a positive acknowledgement is received from the destina- 
tion node. This network control logic implemented in software is simple by mainstream networking 
standards, yet it is adequate for networks of limited size and complexity. Issues such as access shar- 
ing are handled inherently by the request/reply nature of this network, greatly simplifying the traffic 
patterns that must be handled by the software driver. 



5. 1 1 INTERCHIP SERIAL COMMUNICATIONS 



Serial data links are not always restricted to long-distance communications. Within a single com- 
puter system, or even a single circuit board, serial links can provide attractive benefits as compared 
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FIGURE 5.17 Hypothetical network driver flowchart. 



to traditional parallel buses. Computer architectures often include a variety of microprocessor pe- 
ripheral devices with differing bandwidth requirements. Main memory, both RAM and ROM, is a 
central part of computer architecture and is a relatively high-bandwidth element. The fact that the 
CPU must continually access main memory requires a simple, high-bandwidth interface — a parallel 
bus directly or indirectly driven by the CPU. Other devices may not be accessed as often as main 
memory and therefore have a substantially lower bandwidth requirement. Peripherals such as data 
acquisition ICs (e.g., temperature sensors), serial number EEPROMs, or liquid crystal display 
(LCD) controllers might be accessed only several times each second instead of millions of times per 
second. These peripherals can be directly mapped into the CPU’s address space and occupy a spot 
on its parallel bus, but as the number of these low-bandwidth peripherals increases, the complexity 
of attaching so many devices increases. 

Short-distance serial data links can reduce the cost and complexity of a computer system by re- 
ducing interchip wiring, minimizing address decoding logic, and saving pins on IC packages. In 
such a system, the CPU is connected to a serial controller via its parallel bus, and most other periph- 
erals are connected to the controller via several wires in a bus topology as shown in Fig. 5.18. 

Such peripherals must be specifically designed with serial interfaces, and many are. It is common 
for low-bandwidth peripheral ICs to be designed in both parallel and serial variants. In fact, some 
devices are manufactured with only serial interfaces, because their economics overwhelmingly fa- 
vors the reduction in logic, wiring, and pins of a serial data link. A temperature sensor with a serial 
interface can be manufactured with just one or two signal pins plus power. That same sensor might 












